Introduction to modern time series analysis

2 Univariate Stationary Processes

As mentioned in the introduction, the publication of the textbook by
GEORGE E.P. BOX and GWILYM M. JENKINS in 1970 opened a new road to
the analysis of economic time series. This chapter presents the Box-Jen-
kins Approach, its different models and their basic properties in a rather
elementary and heuristic way. These models have become an indispensa-
ble tool for short-run forecasts. We first present the most important ap-
proaches for statistical modelling of time series. These are autoregressive
(AR) processes (Section 2.1) and moving average (MA) processes (Section
2.2), as well as a combination of both types, the so-called ARMA process-
es (Section 2.3). In Section 2.4 we show how this class of models can be
used for predicting the future development of a time series in an optimal
way. Finally, we conclude this chapter with some remarks on the relation
between the univariate time series models described in this chapter and the
simultaneous equations systems of traditional econometrics (Section 2.5).

2.1 Autoregressive Processes

We know autoregressive processes from traditional econometrics: Already
in 1949, DONALD COCHRANE and GUY H. ORCUTT used the first order au-
toregressive process for modelling the residuals of a regression equation.
We will start with this process, then treat the second order autoregressive
process and finally show some properties of autoregressive processes of an
arbitrary but finite order.

2.1.1 First Order Autoregressive Processes

Derivation of Wold’s Representation

A first order autoregressive process, an AR(1) process, can be written as
an inhomogeneous stochastic first order difference equation,
(2.1) xt = + xt-1 + ut,

G. Kirchgässner et al., Introduction to Modern Time Series Analysis, Springer Texts in Business 27
and Economics, DOI 10.1007/978-3-642-33436-8_2, © Springer-Verlag Berlin Heidelberg 2013


where the inhomogeneous part + ut consists of a constant term and a
pure random process ut. Let us assume that for t = t0 the initial value x t 0 is
given. By successive substitution in (2.1) we get
x t0 1 = + x t 0 + u t0 1

x t0 2 = + x t0 1 + u t0 2

= + ( + x t 0 + u t0 1 ) + u t0 2

2
= + + x t0 + u t0 1 + u t0 2

x t0 3 = + x t0 2 + u t0 3

2 3 2
x t0 3 = + + + x t0 + u t0 1 + u t0 2 + u t0 3

2 -1
x t0 = (1 + + + … + ) + x t0
-1 -2
+ u t0 1 + u t0 2 + … + u t0 1 + u t0 ,

or
1
1 j
x t0 = x t0 + u t0 j .
1 j 0

For t = t0 + , we get
t t0 t t0 1
t t0 1 j
(2.2) xt = x t0 + ut j .
1 j 0

The development and thus the properties of this process are mainly deter-
mined by the assumptions on the initial condition x t 0 .
The case of a fixed (deterministic) initial condition is given if x0 is as-
sumed to be a fixed (real) number, for example for t0 = 0, i.e. no random
variable. Then we can write:
t t 1
t 1 j
xt = x0 + ut j .
1 j 0

This process consists of time dependent deterministic and stochastic parts.
Thus, it can never be weakly stationary, since first and second order mo-

2.1 Autoregressive Processes 29

ments are time dependent. It is, however, asymptotically stationary be-
cause the time dependence vanishes for t0 - .
We can imagine the case of stochastic initial conditions as (2.1) being
generated along the whole time axis, i.e. - < t < . If we observe the
process only for positive values of t, the initial value x0 is a random varia-
ble which is generated by this process. Formally, the process with stochas-
tic initial conditions results from (2.2) if the solution of the homogeneous
difference equation has disappeared. This is only possible if | | < 1. There-
fore, in the following, we restrict to the interval –1 < < 1. If lim x t 0 is
t0

bounded, (2.2) for t0 - converges to
j
(2.3) xt = ut j .
1 j 0

The time dependence has disappeared. According to Section 1.5, the AR(1)
process (2.1) has the Wold representation (2.3) with j = j and | | < 1.
This results in the convergence of
2 2j 1
j = = 2
.
j 0 j 0 1
Thus, assuming stochastic initial conditions, the process (2.1) is weakly
stationary.

The Lag Operator

Equation (2.3) can also be derived from relation (2.1) by using the lag op-
erator defined in Section 1.3:
(2.1') (1 – L)xt = + ut .
If we solve for xt we get
1
(2.4) xt = + ut .
1 L 1 L
The expression 1/(1 – L) can formally be expanded to a geometric series,
1 2 2 3 3
= 1 + L + L + L + … .
1 L
Thus, we get
2 2
xt = (1 + L + L + …) + (1 + L + L + …)ut
2 2
= (1 + + + …) + ut + ut-1+ ut-2 + … ,


and because of | | < 1
j
xt = ut j .
1 j 0

The first term could have been derived immediately if we substituted the
value ‘1’ for L in the first term of (2.4). (See also relation (1.8) on p. 11).

Calculation of Moments

Due to representation (2.3), the first and second order moments can be cal-
culated. As E[ut] = 0 holds for all t, we get for the mean

j
E[xt] = E ut j
1 j 0

j
E[xt] = E ut j = =
1 j 0 1
i.e. the mean is constant. It is different from zero if and only if 0. Be-
cause of 1 – > 0, the sign of the mean is determined by the sign of . For
the variance we get
2 2
j
V[xt] = E xt = E ut j
1 j 0

2
= E[(ut + ut-1 + ut-2 + ... )2]
2 4
= E[ u 2 +
t u2 1 +
t u2
t 2 + … + 2 utut-1 + 2 2utut-2 + … ]
2 2 4
= (1 + + + ...),
because E[ut us] = 0 for t s and E[ut us] = 2 for t = s. Applying the sum-
mation formula for the geometric series, and because of | | < 1, we get the
constant variance
2
V[xt] = 2
.
1
The covariances can be calculated as follows:

Cov [xt,xt- ] = E xt xt
1 1


= E[(ut + ut-1 + ... + ut- + ...)
2
(ut- + u t- -1 + u t- -2 + ...)]
-1
= E[(ut + ut-1 + ... + ut- +1
2
+ (ut- + u t- -1 + u t- -2 + ...))
2
(ut- + u t- -1 + u t- -2 + ...)]
2
= E[(ut- + ut- -1 + ut- -2 + ... )2] .
Thus, we get
2
Cov [xt,xt- ] = V[xt- ] = 2
.
1
The autocovariances are only a function of the time difference and not of
time t, and we can write:
2
(2.5) ( ) = 2
, = 0, 1, 2, ... .
1
Therefore, the AR(1) process with | | < 1 and stochastic initial conditions
is weakly stationary.

An Alternative Method for the Calculation of Moments

Under the condition of weak stationarity, i.e. for | | < 1 and stochastic ini-
tial conditions, the mean of xt is constant. If we apply the expectation op-
erator on equation (2.1), we get:
E[xt] = E[ + xt-1 + ut] = + E[xt-1] + E[ut] .
Because of E[ut] = 0 and E[xt] = E[xt-1] = for all t we can write

E[xt] = = .
1
If we consider the deviations from the mean,
x t = xt –
and substitute this in relation (2.1), we get:
xt + = + xt 1 + + ut .
From this it follows that


xt = + ( – 1) + xt 1 + ut

= + ( – 1) + xt 1 + ut
1

(2.6) xt = xt 1 + ut .

This is the AR(1) process belonging to (2.1) with E[ x t ] = 0.
If we multiply equation (2.6) with x t for 0 and take expectations
we can write:
(2.7) E[ x t x t ] = E[ x t x t 1 ] + E[ x t ut] .
Because of (2.3) we get
2
xt = ut- + ut- -1 + ut- -2 + … .
This leads to
2
for 0
(2.8) E[ x t ut] = .
0 for 0

Because of the stationarity assumption and because of the (even) sym-
metry of the autocovariances, ( ) = (- ), equation (2.7) results in
2
= 0: E[ x 2 ]
t = E[ x t x t 1 ] + ,
or
2
(0) = (1) + ,
= 1: E[ x t x t 1 ] = E[ x 2 1 ],
t

or
(1) = (0) .
This leads to the variance of the AR(1) process
2
(0) = 2
.
1
For 1 (2.7) implies
(1) = (0)
2
(2) = (1) = (0)


3
(3) = (2) = (0)

( ) = ( -1) = (0) .
Thus, the covariances can be calculated from the linear homogeneous first
order difference equation
( )– ( -1) = 0
2 2
with the initial value (0) = /(1 – ).

The Autocorrelogram

Because of ( ) = ( )/ (0), the autocorrelation function (the autocorrelo-
gram) of the AR(1) process is
(2.9) ( ) = , = 1, 2, ... .
This function converges geometrically to zero for , and its infinite
sum equals 1/(1 – ) since | | < 1. This convergence is monotone for posi-
tive and oscillating for negative values of .

Example 2.1

For = 0 and {0.9, 0.5, -0.9}, Figures 2.1 to 2.3 each present one realisation
of the corresponding AR(1) process with T = 240 observations. To generate these
series, we used realisations of normally distributed pure random processes with
mean zero and variance one. We always dropped the first 60 observations to elim-
inate the dependence of the initial values.
The realisation for = 0.9, presented in Figure 2.1, is relatively smooth. This is
to be expected given the theoretical autocorrelation function because random vari-
ables with a considerable distance between each other still have high positive cor-
relations.
The development of the realisation in Figure 2.2 with = 0.5 is much less sys-
tematic. The geometric decrease of the theoretical autocorrelation function is ra-
ther fast. The fourth order autocorrelation coefficient is only 0.0625.
Contrary to this, the realisation of the AR(1) process with = -0.9, presented in
Figure 2.3, follows a well pronounced zigzag course with, however, alternating posi-
tive and negative amplitudes. This is consistent with the theoretical autocorrelation
function indicating that all random variables with even-numbered distance are posi-
tively correlated and those with odd-numbered distance negatively correlated.


xt
7.5

5

2.5

0 t

-2.5

-5

-7.5 a) Realisation

1

0.8

0.6

0.4

0.2

0
5 10 15 20
-0.2

-0.4 b) Theoretical autocorrelation function

ˆ
1

0.8

0.6

0.4

0.2

0
5 10 15 20
-0.2

-0.4
c) Estimated autocorrelation function
with confidence intervals

Figure 2.1: AR(1) process with = 0.9


xt
4

2

0 t

-2

-4
a) Realisation

1
0.8
0.6
0.4
0.2
0
-0.2 5 10 15 20
-0.4
-0.6
-0.8
b) Theoretical autocorrelation function
-1

ˆ
1
0.8
0.6
0.4
0.2
0
-0.2 5 10 15 20
-0.4
-0.6
-0.8
-1 c) Estimated autocorrelation function

Figure 2.2: AR(1) process with = 0.5


xt

5

2.5

0 t

-2.5

-5 a) Realisation

1
0.8
0.6
0.4
0.2
0
-0.2 5 10 15 20
-0.4
-0.6
-0.8
-1 b) Theoretical autocorrelation function

ˆ
1
0.8
0.6
0.4
0.2
0
-0.2 5 10 15 20
-0.4
-0.6
-0.8

Figure 2.3: AR(1) process with = -0.9


It generally holds that the closer the parameter is to +1, the smoother the realisa-
tions will be. For negative values of we get zigzag developments which are the
more pronounced the closer is to -1. For = 0 we get a pure random process.
The autocorrelation functions estimated by means of relation (1.10) with the given
realisations are also presented in Figures 2.1 to 2.3. The dotted parallel lines show
approximate 95 percent confidence intervals for the null hypothesis assuming that
the true process is a pure random process. In all three cases, the estimated func-
tions reflect quite well the typical development of the theoretical autocorrelations.

Example 2.2
In a paper on the effect of economic development on the electoral chances of the
German political parties during the period of the social-liberal coalition from 1969
to 1982, GEBHARD KIRCHGÄSSNER (1985) investigated (besides other issues) the
time series properties of the popularity series of the parties constructed by monthly
surveys of the Institute of Demoscopy in Allensbach (Germany). For the period
from January 1971 to April 1982, the popularity series of the Christian Democrat-
ic Union (CDU), i.e. the share of voters who answered that they would vote for
this party (or its Bavarian sister party, the CSU) if there were a general election by
the following Sunday, is given in Figure 2.4. The autocorrelation and the partial
autocorrelation function (which is discussed in Section 2.1.4) are also presented in
this figure. While the autocorrelation function goes slowly towards zero, the par-
tial autocorrelation function breaks off after = 1. This argues for an AR(1) pro-
cess.
The model has been estimated with Ordinary Least Squares (OLS), the method
proposed in Section 2.1.5 for the estimation of autoregressive models. Thus, we
get:
CDUt = 8.053 + 0.834 CDUt-1 + ût,
(3.43) (17.10)
R 2 = 0.683, SE = 1.586, Q(11) = 12.516 (p = 0.326).
The estimated t values are given in parentheses, SE denotes the standard error of
the residuals. The autocorrelogram, which is also given in Figure 2.4, does not in-
dicate any higher-order process. Moreover, given the high p-value, the Ljung-Box
Q statistic with 12 correlation coefficients (i.e. with 11 degrees of freedom) gives
no reason to reject this model. The mean is calculated as
8.053
ˆ 48.512 .
1 0.834

It shows that about 48.5 percent of the voters voted on average for the CDU dur-
ing this period.


Percent
56
54
52
50
48
46
44
42
40 year
1971 1973 1975 1977 1979 1981

a) Popularity of the CDU/CSU, 1971 – 1982
ˆ( )
1
0.8
0.6
0.4
0.2
0
-0.2 5 10 15 20
-0.4
-0.6
-0.8 b) Autocorrelation (__) and partial ( )
-1 autocorrelation functions with
confidence intervals
ˆ( )
1
0.8
0.6
0.4
0.2
0
-0.2 5 10 15 20
-0.4
-0.6
-0.8 c) Estimated autocorrelation function of the
-1
residuals of the estimated AR(1)-process

Figure 2.4: Popularity of the CDU/CSU, 1971 – 1982


Stability Conditions

Along with the stochastic initial value, the condition | | < 1, the so-called
stability condition, is crucial for the stationarity of the AR(1) process. We
can also derive the stability condition from the linear homogeneous differ-
ence equation, which is given for the process itself by
xt – xt-1 = 0,
for its autocovariances by
( ) – ( -1) = 0
and for the autocorrelations by
( ) – ( -1) = 0.
These difference equations have stable solutions, i.e. lim ( ) = 0, if and
only if their characteristic equation
(2.10) – = 0
has a solution (root) with an absolute value smaller than one, i.e. if | | < 1
holds. We get an equivalent condition if we do not consider the character-
istic equation but the lag polynomial of the corresponding difference equa-
tions,
(2.11) 1 – L = 0.
This implies that the solution has to be larger than one in absolute value.
(Strictly speaking, L, which denotes an operator, has to be substituted by a
variable, which is often denoted by ‘z’. To keep the notation simple, we
use L in both meanings.)

Example 2.3
Let us consider the stochastic process
(E2.1) yt = xt + vt .
In this equation, xt is a stationary AR(1) process, xt = xt-1 + ut, with | | < 1; vt is
a pure random process with mean zero and constant variance 2 which is uncorre-
v
lated with the other pure random process ut with mean zero and constant variance
2
u.
We can interpret the stochastic process yt as an additive decomposition of two
stationary components. Then yt itself is stationary. In the sense of MILTON
FRIEDMAN (1957) we can interpret xt as the permanent (systematic) and vt as the
transitory component.


What does the correlogram of yt look like? As both xt and vt have zero mean,
E[yt] = 0. Multiplying (E2.1) with yt- and taking expectations results in
E[yt- yt] = E[yt- xt] + E[yt- vt] .
Due to yt- = xt- + vt- , we get
E[yt- yt] = E[xt- xt] + E[vt- xt] + E[xt- vt] + E[vt- vt].
As ut and vt are uncorrelated, it holds that E[vt- xt] = E[xt- vt] = 0, and because of
the stationarity of the two processes, we can write
(E2.2) y( ) = x( )+ v( ).
For = 0 we get the variance of yt as
2
2 u 2
y(0) = x(0) + v = 2
+ v .
1
For > 0, because of v( ) = 0 for 0, we get from (E2.2)
2
u
y( ) = x( ) = 2
.
1
Thus, we finally get

y( ) = 2 2 2
, = 1, 2, ...,
1 (1 ) v / u

for the correlogram of yt. The overlay of the systematic component by the transito-
ry component reduces the autocorrelation generated by the systematic component.
The larger the variance of the transitory component, the stronger is this effect.

2.1.2 Second Order Autoregressive Processes

Generalising (2.1), the second order autoregressive process (AR(2)) can
be written as
(2.12) xt = + 1 xt-1 + 2 xt-2 + ut,
2
with ut denoting a pure random process with variance and 2 0. With
the lag operator L we get
(2.13) (1 – 1 L– 2 L2) xt = + ut .
With (L) = 1 – 1 L– 2 L2 we can write
(2.14) (L) xt = + ut .


As for the AR(1) process, we get the Wold representation from (2.14) if
we invert (L); i.e. under the assumption that -1(L) exists and has the
property
-1
(2.15) (L) (L) = 1
we can ‘solve’ for xt in (2.14):
-1 -1
(2.16) xt = (L) + (L) ut .
If we use the series expansion with undetermined coefficients for
-1 2
(L) = 0 + 1L + 2L + ...
it has to hold that
1 = (1 – 1 L– 2 L2 )( 0 + 1L + 2L
2
+ 3L
3
+ ... )
because of (2.15). This relation is an identity only if the coefficients of Lj,
j = 0, 1, 2, ..., are equal on both the right and the left hand side. We get
1 0 1 L L2 2 3 L3 ...
2 3
1 0 L 1 1L 1 2L ... .
2 3
2 0L 2 1L ...

Comparing the coefficients of the lag polynomials on the right- and left-
hand side finally leads to
L0: 0 = 1
L1: 1 – 1 0 = 0 1 = 1.

L2: 2 – 1 1 – 2 0 = 0 2 = 2
1 + 2.

L3: 3 – 1 2 – 2 1 = 0 3 = 3
1 +2 1 2.

By applying this so-called method of undetermined coefficients, we get the
values j, j = 2, 3, ..., from the linear homogeneous difference equation
j – 1 j-1 – 2 j-2 = 0
with the initial conditions 0 = 1 and 1 = 1.
The stability condition for the AR(2) process requires that, for j ,
the j converge to zero, i.e. that the characteristic equation of (2.12),
2
(2.17) – 1 – 2 = 0,
has only roots with absolute values smaller than one, or that all solutions
of the lag polynomial in (2.13),


(2.18) 1– 1 L– 2 L2 = 0
are larger than one in modulus. Together with stochastic initial conditions,
this guarantees the stationarity of the process. The stability conditions are
fulfilled if the following parameter restrictions hold jointly for (2.17) and
(2.18):
1 + (- 1) + (- 2) > 0,
1 – (- 1) + (- 2) > 0,
1 – (- 2) > 0.
As a constant is not changed by the application of the lag operator, the
number ‘1’ can substitute the lag operator in the corresponding terms.
Thus, due to (2.16), the Wold representation of the AR(2) process is given
by

(2.19) xt = j ut j , 0 = 1.
1 1 2 j 0

Under the assumption of stationarity, the expected value of the stochastic
process can be calculated directly from (2.12) since E[xt] = E[xt-1] = E[xt-2]
= . We get
= + 1 + 2

or

(2.20) E[xt] = = .
1 1 2

As the stability conditions are fulfilled, 1 – 1 – 2 > 0 holds, i.e. the sign
of also determines the sign of .
In order to calculate the second order moments, we can assume – with-
out loss of generality – that = 0, which is equivalent to = 0. Multiply-
ing (2.12) with xt- , 0, and taking expectations leads to
(2.21) E[xt- xt] = 1 E[xt- xt-1] + 2 E[xt- xt-2] + E[xtut] .
Because of representation (2.19), relation (2.8) holds here as well. This
leads to the following equations
2
0 : (0) 1 (1) 2 (2)
(2.22) 1 : (1) 1 (0) 2 (1) ,
2 : (2) 1 (1) 2 (0)


and, more generally, the following difference equation holds for the auto-
covariances ( ), 2,
(2.23) ( )– 1 ( -1) – 2 ( -2) = 0.
As the stability conditions hold, the autocovariances which can be recur-
sively calculated with (2.23) are converging to zero for .
The relations (2.22) result in
1 2 2
(2.24) V[xt] = (0) = 2 2
(1 2 ) [(1 2 ) 1 ]
for the variance of the AR(2) process, and in

1 2
(1) = 2 2
,
(1 2 ) [(1 2 ) 1 ]
and
2 2
1 2 2 2
(2) = 2 2
,
(1 2 ) [(1 2 ) 1 ]
for the autocovariances of order one and two.
The autocorrelations can be calculated accordingly. If we divide (2.23)
by the variance (0) we get the linear homogeneous second order differ-
ence equation,
(2.25) ( )– 1 ( -1) – 2 ( -2) = 0
with the initial conditions (0) = 1 and (1) = 1/(1 – 2) for the autocorre-
lation function. Depending on the values of 1 and 2, AR(2) processes can
generate quite different developments, and, therefore, these processes can
show considerably different characteristics.

Example 2.4
Let us consider the AR(2) process
(E2.3) xt = 1 + 1.5 xt-1 – 0.56 xt-2 + ut
with a variance of ut of 1. Because the characteristic equation
2
– 1.5 + 0.56 = 0
has the two roots 1 = 0.8 and 2 = 0.7, (E2.3) is stationary, given that we have
stochastic initial conditions. The expected value of this process is


1
= = 16.6 .
1 1.5 0.56

The variance of (E2.3) can be calculated from (2.24) as (0) = 19.31. A realisation
of this process (with 180 observations) is given in Figure 2.5 in which the (esti-
mated) mean was subtracted. Thus, the realisations fluctuate around zero, and the
process always tends to go back to the mean. This mean-reverting behaviour is a
typical property of stationary processes.
Due to (2.25) we get
( ) – 1.5 ( -1) + 0.56 ( -2) = 0, = 2, 3, ...,
with (0) = 1, (1) = 0.96
for the autocorrelation function. The general solution of this homogeneous differ-
ence equation is
( ) = C1 (0.8) + C2 (0.7) ,
where C1 and C2 are two arbitrary constants. Taking into account the two initial
conditions we get
( ) = 2.6 (0.8) – 1.6 (0.7)
for the autocorrelation coefficients. This development is also expressed in Figure
2.5. The coefficients are always positive but strictly monotonically decreasing.
Initially, the estimated autocorrelogram using the given realisation is also mono-
tonically decreasing, but, contrary to the theoretical development, the values begin
to fluctuate from the tenth lag onwards. However, except for the coefficient for =
16, the estimates are not significantly different from zero; they are all inside the
approximate 95 percent confidence interval indicated by the dotted lines.

The characteristic equations of stable autoregressive processes of second
or higher order can result in conjugate complex roots. In this case, the time
series exhibit dampened oscillations, which are shocked again and again
by the pure random process. The solution of the homogeneous part of
(2.12) for conjugate complex roots can be represented by
xt = dt (C1 cos (f t) + C2 sin (f t))
with C1 and C2 again being arbitrary constants that can be determined by
using the initial conditions. The dampening factor
d = 2

corresponds to the modulus of the two roots, and

f = arccos 1

2 2


xt
10

5

0 t

-5

-10
a) Realisation

1

0.8

0.6

0.4

0.2

0
5 10 15 20
-0.2

-0.4
b) Theoretical autocorrelation function
ˆ
1

0.8

0.6

0.4

0.2

0
5 10 15 20
-0.2

-0.4
c) Estimated autocorrelation function

Figure 2.5: AR(2) process with 1 = 1.5, 2= -0.56


xt

5

2.5

0 t

-2.5

-5
a) Realisation

1
0.8
0.6
0.4
0.2
0
-0.2 5 10 15 20
-0.4
-0.6
-0.8
-1 b) Theoretical autocorrelation function

ˆ

1
0.8
0.6
0.4
0.2
0
-0.2 5 10 15 20
-0.4
-0.6
-0.8

Figure 2.6: AR(2) process with 1 = 1.4 and 2 = -0.85


is the frequency of the oscillation. The period of the cycles is P = 2 /f.
Processes with conjugate complex roots are well-suited to describe busi-
ness cycle fluctuations.

Example 2.5
Consider the AR(2) process
(E2.4) xt = 1.4 xt-1 – 0.85 xt-2 + ut,
with a variance of ut of 1. The characteristic equation
2
– 1.4 + 0.85 = 0
has the two solutions 1 = 0.7 + 0.6i and 2 = 0.7- 0.6i. (‘i’ stands for the imagi-
nary unit: i2 = - 1.) The modulus (dampening factor) is d = 0.922. Thus, (E2.4)
with stochastic initial conditions and a mean of zero is stationary. According to
(2.24) the variance is given by (0) = 8.433.
A realisation of this process with 180 observations is given in Figure 2.6. Its
development is cyclical around its zero mean. For the autocorrelation function we
get
( ) – 1.4 ( -1) + 0.85 ( -2) = 0, = 2, 3, ...,
(0) = 1, (1) = 0.76,
because of (2.25).
The general solution is
( ) = 0.922 (C1 cos (0.709 ) + C2 sin (0.709 )) .
Taking into account the two initial conditions, we get for the autocorrelation coef-
ficients
( ) = 0.922 (cos (0.709 ) + 0.1 sin (0.709 )) ,
with a frequency of f = 0.709.
In case of quarterly data, this corresponds to a period length of about 9 quarters.
Both the theoretical and the estimated autocorrelations in Figure 2.6 show this
kind of dampened periodical behaviour.

Example 2.6
Figure 2.7 shows the development of the three month money market rate in Frank-
furt (GSR) from the first quarter of 1970 to the last quarter of 1998 as well as the
autocorrelation and the partial autocorrelation functions explained in Section 2.1.4.
Whereas the autocorrelation function tends only slowly towards zero, the partial
autocorrelation function breaks off after two lags. As will be shown below, this
indicates an AR(2) process. For the period from 1970 to 1998, estimation with
OLS results in the following:


Percent

16
14
12
10
8
6
4
2
0 year
1970 1975 1980 1985 1990 1995

a) Three month money market rate in Frankfurt
1970 – 1998
ˆ( )
1
0.8
0.6
0.4
0.2
0
-0.2 5 10 15 20
-0.4
-0.6
b) Estimated autocorrelation (__) and partial
-0.8
autocorrelation ( ) functions with confidence
-1 intervals

ˆ( )
1
0.8
0.6
0.4
0.2
0
-0.2 5 10 15 20
-0.4
-0.6
-0.8 c) Estimated autocorrelation function of the
-1 residuals of the estimated AR(2)-process

Figure 2.7: Three month money market rate in Frankfurt, 1970 – 1998


GSRt = 0.575 + 1.407 GSRt-1 – 0.498 GSRt-2 + ût,.
(2.82) (17.50) (-6.16)
R 2 = 0.910, SE = 0.812, Q(6) = 6.475 (p = 0.372),
with t values being again given in parentheses. On the 0.1 percent level, both es-
timated coefficients of the lagged interest rates are significantly different from ze-
ro. The autocorrelogram of the estimated residuals (given in Figure 2.7c) as well
as the Ljung-Box Q statistic which is calculated with 8 correlation coefficients
(and 6 degrees of freedom) does not indicate any higher order process.
The two roots of the process are 0.70 ± 0.06i, i.e. they indicate dampened cycles.
The modulus (dampening factor) is d = 0.706; the frequency f = 0.079 corresponds
to a period of 79.7 quarters and therefore of nearly 20 years. Correspondingly, this
oscillation cannot be detected in the estimated autocorrelogram presented in Fig-
ure 2.7b.

2.1.3 Higher Order Autoregressive Processes

An AR(p) process can be described by the following stochastic difference
equation,
(2.26) xt = + 1 xt-1 + 2 xt-2 + ... + p xt-p + ut,
with p 0, where ut is again a pure random process with zero mean and
variance 2. Using the lag operator we can also write:
(2.26') (1 – 1 L– 2 L2 – ... – p Lp) xt = + ut.
If we assume stochastic initial conditions, the AR(p) process in (2.26) is
stationary if the stability conditions are satisfied, i.e. if the characteristic
equation
p p-1 p-2
(2.27) – 1 – 2 – ... – p = 0
only has roots with absolute values smaller than one, or if the solutions of
the lag polynomial
(2.28) 1– 1 L– 2 L2 – ... – p Lp = 0
only have roots with absolute values larger than one.
If the stability conditions are satisfied, we get the Wold representation
of the AR(p) process by the series expansion of the inverse lag polynomial,
1 2
p
= 1+ 1L + 2L + ...
1 1L ... pL

as


(2.29) xt = j ut j .
1 1 ... p j 0

Generalising the approach that was used to calculate the coefficients of the
AR(2) process, the series expansion can again be calculated by the method
of undetermined coefficients.
From (2.29) we get the constant (unconditional) expectation as
E[xt] = = .
1 1 ... p

Again, similarly to the AR(1) and AR(2) cases, a necessary condition for
stability is
1– 1 – 2 – ... – p > 0.
Without loss of generality we can set = 0, i.e. = 0, in order to calcu-
late the autocovariances. Because of ( ) = E[xt- xt], we get according to
(2.26)
(2.30) ( ) = E[xt- ( 1 xt-1 + 2 xt-2 + ... + p xt-p + ut)] .
For = 0, 1, ... , p, it holds that
2
(0) 1 (1) 2 (2) p (p)
(1) 1 (0) 2 (1) p (p 1)
(2.31)

(p) 1 (p 1) 2 (p 2) p (0)

because of the symmetry of the autocovariances and because of E[xtut] =
2
for = 0 and zero for > 0.
This is a linear inhomogeneous equation system for given i and 2 to
derive the p + 1 unknowns (0), (1), ..., (p). For > p we get the linear
homogeneous difference equation to calculate the autocovariances of order
> p:
(2.32) ( )– 1 ( -1) – ... – p ( -p) = 0.
If we divide (2.32) by (0), we get the corresponding difference equation
to calculate the autocorrelations:
(2.33) ( )– 1 ( -1) – ... – p ( -p) = 0.
The initial conditions (1), (2), ..., (p) can be derived from the so-called
Yule-Walker equations. We get those if we successively insert = 1, 2, ...,
p in (2.33), or, if the last p equations in (2.31) are divided by (0),


(1) = 1 + 2 (1) + 3 (2) + ... + p (p-1)
(2) = 1 (1) + 2 + 3 (1) + ... + p (p-2)
(2.34)
(p) = 1 (p-1) + 2 (p-2) + 3 (p-3) + ... + p

If we define ' = ( (1), (2), ..., (p)), ' = ( 1, 2, ..., p) and
1 (1) (2) (p 1)
(1) 1 (1) (p 2)
R
p p

(p 1) (p 2) (p 3) 1

we can write the Yule-Walker equations (2.34) in matrix form,
(2.35) = R .
If the first p autocorrelation coefficients are given, the coefficients of the
AR(p) process can be calculated according to (2.35) as
(2.36) = R-1 .
Equations (2.35) and (2.36) show that there is a one-to-one mapping be-
tween the p coefficients and the first p autocorrelation coefficients of
an AR(p) process. If there is a generating pure random process, it is suffi-
cient to know either or to identify the AR(p) process. Thus, there are
two possibilities to describe the structure of an autoregressive process of
order p: the parametric representation that uses the parameters 1, 2, ..., p,
and the non-parametric representation with the first p autocorrelation coef-
ficients (1), (2), ..., (p). Both representations contain exactly the same
information. Which representation is used depends on the specific situa-
tion. We usually use the parametric representation to describe finite order
autoregressive processes (with known order).

Example 2.7
Let the fourth order autoregressive process
xt = 4 xt-4 + ut, 0 < 4 < 1,
2
be given, where ut is again white noise with zero mean and variance . Applying
(2.31) we get:
2
(0) = 4 (4) + ,
(1) = 4 (3),
(2) = 4 (2),


(3) = 4 (1),
(4) = 4 (0).
From these relations we get
2
(0) = 2
,
1 4

(1) = (2) = (3) = 0,
2
(4) = 4 2
.
1 4

As can easily be seen, only the autocovariances with lag = 4j, j = 1, 2, ... are dif-
ferent from zero, while all other autocovariances are zero. Thus, for > 0 we get
the autocorrelation function
j
4 for 4 j, j 1, 2, ...
( ) = .
0 elsewhere.

Only every fourth autocorrelation coefficient is different from zero; the sequence
of these autocorrelation coefficients decreases monotonically like a geometric se-
ries. Employing such a model for quarterly data, this AR(4) process captures the
correlation between random variables that are distant from each other by a multi-
plicity of four periods, i.e. the structure of the correlations of all variables which
belong to the i-th quarter of a year, i = 1, 2, 3, 4, follows an AR(1) process while
the correlations between variables that belong to different quarters are always ze-
ro. Such an AR(4) process provides a simple possibility of modelling seasonal ef-
fects which typically influence the same quarters of different years. For empirical
applications, it is advisable to first eliminate the deterministic component of a sea-
sonal variation by employing seasonal dummies and then to model the remaining
seasonal effects by such an AR(4) process.

2.1.4 The Partial Autocorrelation Function

Due to the stability conditions, autocorrelation functions of stationary fi-
nite order autoregressive processes are always sequences that converge to
zero but do not break off. This makes it difficult to distinguish between
processes of different orders when using the autocorrelation function. To
cope with this problem, we introduce a new concept, the partial autocorre-
lation function. The partial correlation between two random variables is
the correlation that remains if the possible impact of all other random vari-
ables has been eliminated. To define the partial autocorrelation coefficient,
we use the new notation,


xt = k1xt-1 + k2xt-2 + … + kkxt-k + ut ,
where ki is the coefficient of the variable with lag i if the process has or-
der k. (According to the former notation it holds that i = ki i = 1,2,…,k.)
The coefficients kk are the partial autocorrelation coefficients (of order k),
k = 1,2,… . The partial autocorrelation measures the correlation between xt
and xt-k which remains when the influences of xt-1, xt-2, ..., xt-k+1 on xt and
xt-k have been eliminated.
Due to the Yule-Walker equations (2.35), we can derive the partial au-
tocorrelation coefficients kk from the autocorrelation coefficients if we
calculate the coefficients kk, which belong to xt-k, for k = 1, 2, ... from the
corresponding linear equation systems
1 (1) (2) (k 1) k1 (1)
(1) 1 (2) (k 2) k2 (2)
, k = 1, 2, ... .

(k 1) (k 2) (k 3) 1 kk (k)

With Cramer’s rule we get
1 (1) (1)
(1) 1 (2)

(k 1) (k 2) (k)
(2.37) kk , k = 1, 2, ... .
1 (1) (k 1)
(1) 1 (k 2)

(k 1) (k 2) 1
Thus, if the data generating process (DGP) is an AR(1) process, we get for
the partial autocorrelation function:
11 = (1)
1 (1)
(1) (2) (2) (1) 2
22 = = = 0,
1 (1) 1 (1) 2
(1) 1


because of (2) = (1)2. Generally, the partial autocorrelation coefficients
kk = 0 for k >1 in an AR(1) process.
If the DGP is an AR(2) process, we get
(2) (1)2
11 = (1), 22 = , kk = 0 for k > 2 .
1 (1)2
The same is true for an AR(p) process: all partial autocorrelation coeffi-
cients of order higher than p are zero. Thus, for finite order autoregressive
processes, the partial autocorrelation function provides the possibility of
identifying the order of the process by the order of the last non-zero partial
autocorrelation coefficient. We can estimate the partial autocorrelation co-
efficients consistently by substituting the theoretical values in (2.37) by
their consistent estimates (1.10). For the partial autocorrelation coefficients
which have a theoretical value of zero, i.e. the order of which is larger than
the order of the process, we get asymptotically that they are normally dis-
tributed with E[ ˆ kk ] = 0 and V[ ˆ kk ] = 1/T for k > p .

Example 2.8
The AR(1) process of Example 2.1 has the following theoretical partial autocorre-
lation function: 11 = (1) = and zero elsewhere. In this example, takes on the
values 0.9, 0.5 and -0.9. The estimates of the partial autocorrelation functions for
the realisations in Figures 2.1 and 2.3 are presented in Figure 2.8. It is obvious for
both processes that these are AR(1) processes. The estimated value for the process
with = 0.9 is ˆ 11 = 0.91, while all other partial autocorrelation coefficients are
not significantly different from zero. We get ˆ = -0.91 for the process with
11

= -0.9, while all estimated higher order partial autocorrelation coefficients do not
deviate significantly from zero.
The AR(2) process of Example 2.4 has the following theoretical partial auto-
correlation function: 11 = 0.96, 22 = -0.56 and zero elsewhere. The realisation of
this process, which is given in Figure 2.5, leads to the empirical partial autocorre-
lation function in Figure 2.8. It corresponds quite closely to the theoretical func-
tion; we get ˆ 11 = 0.95 and ˆ 22 = -0.60 and all higher order partial autocorrelation
coefficients are not significantly different from zero. The same holds for the
AR(2) process with the theoretical non-zero partial autocorrelations 11 = 0.76 and
ˆ ˆ
22 = -0.85 given in Example 2.5. We get the estimates 11 = 0.76 and 22 = -0.78,
whereas all higher order partial correlation coefficients are not significantly differ-
ent from zero.


kk
1
0.8
0.6
0.4
0.2
0 k
-0.2 5 10 15 20
-0.4
-0.6
-0.8
AR(1) process with = 0.9
-1

kk
1
0.8
0.6
0.4
0.2
0 k
-0.2 5 10 15 20
-0.4
-0.6
-0.8
AR(1) process with = -0.9
-1

kk
1
0.8
0.6
0.4
0.2
0 k
-0.2 5 10 15 20
-0.4
-0.6
-0.8
AR(2) process with 1 = 1.5, 2 = -0.56
-1
kk
1
0.8
0.6
0.4
0.2
0 k
-0.2 5 10 15 20
-0.4
-0.6
-0.8
-1 AR(2) process with 1 = 1.4, 2 = -0.85

Figure 2.8: Estimated partial autocorrelation functions


2.1.5 Estimating Autoregressive Processes

Under the assumption of a known order p we have different possibilities to
estimate the parameters:
(i) If we know the distribution of the white noise process that generates
the AR(p) process, the parameters can be estimated by using maxi-
mum likelihood (ML) methods.
(ii) The parameters can also be estimated with the method of moments by
using the Yule-Walker equations.
(iii) A further possibility is to treat
(2.26) xt = + 1 xt-1 + 2 xt-2 + ... + p xt-p + ut,
as a regression equation and apply the ordinary least squares (OLS)
method for estimation. OLS provides consistent estimates. Moreover,
if (2.26) fulfils the stability conditions, T ( ˆ ) as well as
T( ˆ i i ) , i = 1, 2, ..., p, are asymptotically normally distributed.

If the order of the AR process is unknown, it can be estimated with the
help of information criteria. For this purpose, AR processes with succes-
sively increasing orders p = 1, 2, ..., pmax are estimated. Finally, the order
p* is chosen which minimises the respective criterion. The following crite-
ria are often used:
(i) The final prediction error which goes back to HIROTUGU AKAIKE
(1969)
T
T m 1
FPE = (u (p) ) 2 .
ˆt
T m T t 1

(ii) Closely related to this is the Akaike information criterion (HIROTUGU
AKAIKE (1974))
T
1 2
AIC = ln (u (p) ) 2
ˆt m .
T t 1 T

(iii) Alternatives are the Bayesian criterion of GIDEON SCHWARZ (1978)
T
1 ln T
SC = ln (u (p) ) 2
ˆt m
T t 1 T

(iv) as well as the criterion developed by EDWARD J. HANNAN and
BARRY G. QUINN (1979)


T
1 2 ln(ln T)
HQ = ln (u (p) ) 2
ˆt m .
T t 1 T

u (p) are the estimated residuals of the AR(p) process, while m is the number
ˆt
of estimated parameters. If the constant term is estimated, too, m = p + 1
for an AR(p) process. These criteria are always based on the same princi-
ple: They consist of one part, the sum of squared residuals (or its loga-
rithm), which decreases when the number of estimated parameters increas-
es, and of a ‘penalty term’, which increases when the number of estimated
parameters increases. Whereas the first two criteria overestimate the true
(finite) order asymptotically, the two other criteria estimate the true order
of the process consistently. For T 16, the penalty term of SC is larger
than the one of HQ which itself is larger than the one of AIC. This leads to
the following ordering of the estimated AR orders:
SC order HQ order AIC order.
Please note that choosing such an order does not always imply that we
have white noise residuals. This has to be checked independently. Many
computer programmes like, for example, EViews, do not exactly report the
criteria given in (ii) through (iv). Relying on the log-likelihood function
instead of on the sum of squared residuals directly, they add 1 + ln(2 )
2.8379, which does, of course, neither affect the order nor which value of p
minimises the information criteria.

Example 2.9
As in Example 2.6, we take a look at the development of the three month money
market interest rate in Frankfurt am Main. If, for this series, we estimate AR pro-
cesses up to the order p = 4, we get the following results (for T = 116):
p = 0: AIC = 4.8334, HQ = 4.8430, SC = 4.8571;
p = 1: AIC = 2.7180, HQ = 2.7373, SC = 2.7655;
p = 2: AIC = 2.4457, HQ = 2.4746, SC = 2.5169;
p = 3: AIC = 2.4609, HQ = 2.4995, SC = 2.5559;
p = 4: AIC = 2.4778, HQ = 2.5260, SC = 2.5965.
With all three criteria we get the minimum for p = 2. Thus, the optimal number of
lags is p* = 2, as used in Example 2.6.


2.2 Moving Average Processes

Moving average processes of an infinite order have already occurred when
we presented the Wold decomposition theorem. They are, above all, of
theoretical importance as, in practice, only a finite number of (different)
parameters can be estimated. In the following, we consider finite order
moving average processes. We start with the first order moving average
process and then discuss general properties of finite order moving average
processes.

2.2.1 First Order Moving Average Processes

The first order moving average process (MA(1)) is given by the following
equation:
(2.38) xt = + ut – ut-1 ,
or
(2.38') xt – = (l – L)ut ,
with ut again being a pure random process. The Wold representation of an
MA(1) process (as of any finite order MA process) has a finite number of
terms. In this special case, the Wold coefficients are 0 = 1, 1 = - and j
2
= 0 for j 2. Thus, j is finite for all finite values of , i.e. an MA(1)
j

process is always stationary.
Taking expectations of (2.38) leads to
E[xt] = + E[ut] – E[ut-1] = .
The variance can also be calculated directly,
V[xt] = E[(xt – )2]
= E[(ut – ut-1)2]
2
= E[( u 2 – 2 ut ut-1 +
t u 2 1 )]
t

2 2
= (1 + ) = (0) .
Therefore, the variance is constant at any point of time.
For the covariances of the process we get
E[(xt – )(xt+ – )] = E[(ut – ut-1)(ut+ – ut+ -1)]
2
= E[(utut+ – utut+ –1 – ut-1ut+ + ut-1ut+ -1)] .

2.2 Moving Average Processes 59

The covariances are different from zero only for = ± 1, i.e. for adjoining
random variables. In this case
2
(1) = - .
Thus, for an MA(1) process, all autocovariances and therefore all autocorre-
lations with an order higher than one disappear, i.e. ( ) = ( ) = 0 for 2.
The correlogram of an MA(1) process is

(0) = 1, (1) = 2
, ( ) = 0 for 2.
1
If we consider (1) as a function of , (1) = f( ), it holds that f(0) = 0 and
f( ) = -f(- ), i.e. that f( ) is point symmetric to the origin, and that |f( )|
0.5. f( ) has its maximum at = -1 and its minimum at = 1. Thus, an
MA(1) process cannot have a first order autocorrelation above 0.5 or be-
low -0.5.
If we know the autocorrelation coefficient (1) = 1, for example, by es-
timation, we can derive (estimate) the corresponding parameter by using
the equation for the first order autocorrelation coefficient,
2
(1 + ) 1 + = 0.
The quadratic equation can also be written as
2 1
(2.39) + + 1 = 0,
1

and it has the two solutions
1 2
1,2 = 1 1 4 1 .
2 1
Thus, the parameters of the MA(1) process can be estimated non-linearly
with the method of moments: the theoretical moments are substituted by
their consistent estimates and the resulting equation is used for estimating
the parameters consistently.
Because of | 1| 0.5, the quadratic equation always results in real roots.
They also have the property that 1 2 = 1. This gives us the possibility to
model the same autocorrelation structure with two different parameters,
where one is the inverse of the other.
In order to get a unique parameterisation, we require a further property
of the MA(1) process. We ask under which conditions the MA(1) process
(2.38) can have an autoregressive representation. By using the lag operator
representation (2.38') we get


1
ut = – + xt .
1 1 L
An expansion of the series 1/(1 – L) is only possible for < 1 and re-
sults in the following AR( ) process
2
ut = – + xt + xt-1 + xt-2 + ...
1
or
2
xt + xt-1 + xt-2 + ... = + ut .
1
This representation requires the condition of invertibility ( < 1). In this
case, we get a unique parameterisation of the MA(1) process. Applying the
lag polynomial in (2.38'), we can formulate the invertibility condition in
the following way: An MA(1) process is invertible if and only if the root
of the lag polynomial
1– L = 0
is larger than one in modulus.

Example 2.10
The following MA(1) process is given:
(E2.5) xt = t – t-1, t ~ N(0, 22),
with = -0.5. For this process we get
E[xt] = 0,
V[xt] = (1 + 0.52)·4 = 5,
0.5
(1) = = 0.4,
1 0.52
( ) = 0 for 2.
Solving the corresponding quadratic equation (2.39) for this value of (1) leads to
the two roots 1 = -2.0 and 2 = -0.5. If we now consider the process
(E2.5a) yt = t + 2 t-1, t ~ N(0, 1),
we obtain the following results:
E[yt] = 0,
V[yt] = (1 + 2.02)·1 = 5,


2.0
(1) = = 0.4,
1 2.02
( ) = 0 for 2,
i.e. the variances and the autocorrelogram of the two processes (E2.5) and (E2.5a)
are identical. The only difference between them is that (E2.5) is invertible, be-
cause the invertibility condition < 1 holds, whereas (E2.5a) is not invertible.
Thus, given the structure of the correlations, we can choose the one of the two
processes that fulfils the invertibility condition without imposing any restrictions
on the structure of the process.

With equation (2.37), the partial autocorrelation function of the MA(1)
process can be calculated in the following way:
11 = (1),
1 (1)
(1) 0 (1) 2
22 = = < 0,
1 (1) 1 (1) 2
(1) 1

1 (1) (1)
(1) 1 0
0 (1) 0 (1)3
33 = = 0 for 0,
1 (1) 0 1 2 (1) 2
(1) 1 (1)
0 (1) 1

1 (1) 0 (1)
(1) 1 (1) 0
0 (1) 1 0
0 0 (1) 0 (1) 4
44 = = < 0,
1 (1) 0 0 (1 (1) 2 ) 2 (1) 2
(1) 1 (1) 0
0 (1) 1 (1)
0 0 (1) 1
etc.


If is positive, (1) is negative and vice versa. This leads to the two
possible patterns of partial autocorrelation functions, exemplified by =
±0.8:
= 0.8, ii {-0.49,-0.31,-0.22, -0.17, ... } ,
= -0.8, ii {0.49,-0.31, 0.22, -0.17, ... } .
Thus, contrary to the AR(1) process, the autocorrelation function of the
MA(1) process breaks off, while the partial autocorrelation function does
not. These properties hold generally, since invertible finite order MA pro-
cesses are equivalent to infinite order AR processes.

2.2.2 MA(1) and Temporal Aggregation

The time series which are discussed in this book are measured in discrete
time, with intervals of equal length. Exchange rates, for example, are nor-
mally quoted at the end of each trading day. For econometric analyses,
however, monthly, quarterly, or even annual data are used, rather than the-
se daily values. Usually, averages or end-of-period data are used for tem-
poral aggregation.
Thus, two aggregation schemes have to be distinguished. The first one is
skip sampling (or: systematic sampling) where only every mth data point is
recorded. If xt is the basic series at t = 1, 2, 3,…, the skip sampled series ys
with new time scale s is end-of-period data,
y1 = xm, y2 = x2m, y3 = x3m, …, ys = xsm.
Such an aggregation is typical for stock variables. However, the second
scheme of averaging over m non-overlapping periods is also widely used,
in particular for rates or indices:
1
y1 xm xm 1 ... x1
m
1
y2 x 2m x 2m 1 ... x m 1
m

1
ys x sm x sm 1 ... x (s 1)m 1 .
m


In the following, we do not present a general theory of temporal aggrega-
tion but just discuss a special case of particular applied interest, the ran-
dom walk, with
xt = xt-1 + ut,
where an artificial MA(1) structure arises due to aggregation by averaging.
It is straightforward to see that systematic sampling does not affect the
random walk property, since in this case we can write
sm
ys = x0 + ut .
t 1

From this representation we get
ys = ys-1 + s,

with s being white noise:
s = usm + usm-1 + ... + u(s-1)m+1,
with E[ s] = 0 and
2
m u for 0
E( s · s– ) = .
0 elsewhere

Hence, the random walk property is inherited by ys, only the variance of
the differences ys – ys-1 is inflated in the obvious way. In case of averaging,
ys , matters get more complicated. It can, however, be shown that the dif-
ferences
ys ys 1 s

follow no longer a white noise process but an MA(1) scheme hidden be-
hind
1
s u sm 2u sm 1 ... mu s 1m 1
... 2u s 2 m 3
us 2 m 2
.
m
We omit details but refer to HOLBROOK WORKING (1960) who showed
that with increasing aggregation level, m , one obtains the autocorre-
lation function


1, 0
E s s 1
( ) = , 1 .
V s 4
0, elsewhere

Note that the above autocorrelation function corresponds to the following
MA(1)-process

s us us 1

where u s is white noise, and the limiting value (for m ) of the MA pa-
rameter is
3 2 0.268.
GEORGE C. TIAO (1972) generalised this result the following way:
If xt – xt-1 is not generated by white noise but by an invertible MA(1) pro-
cess, then ys ys 1 behaves with growing m like the MA(1) process
us u s 1 , where is independent of the underlying MA(1) structure of xt
– xt-1. This result even continues to hold when the assumption that xt – xt-1
is MA(1) is replaced by a more general moving average process of higher
order as introduced in subsection 2.2.3.

Example 2.11
Consider averaging over m = 2 periods,
1
ys x 2s x 2s 1
.
2
For the random walk xt = xt-1 + ut, it holds that

s ys ys 1

1
= (x2s + x2s-1 – x2s-2 – x2s-3)
2
1
= ( u2s + 2 u2s-1 + u2s-2) .
2
This process can be described as

s us us 1

with = 2 2 – 3 –0.172, and


3 2
u for 0
2
1 2
E( s · s ) = u for 1,
4
0 elsewhere

such that for m = 2 the autocorrelation coefficient at lag one becomes (1) = 1/6.

Example 2.12
Example 1.3 as well as Figure 1.8 present the end-of-month exchange rate be-
tween the Swiss Franc and the U.S. Dollar over the period from January 1974 to
December 2011. The autocorrelogram of the first differences of the logarithms of
this time series indicates that they follow a pure random process. The tests we ap-
plied did not reject this null hypothesis.
If we use monthly averages instead of end-of-month data, the following MA(1)
process can be estimated for the first difference of the logarithms of this exchange
rate:
ln(et) = -0.003 + ût + 0.308 ût-1,
(-1.53) (6.91)
R2 = 0.082, SE = 0.028, Q(11) = 8.216 (p = 0.694),
JB = 21.194 (p = 0.000),
with the t values again given in parentheses. ln(·) denotes the natural logarithm.
The estimated coefficient of the MA(1) term is highly significantly different from
zero. The Ljung-Box Q-statistic indicates that there is no longer any significant
autocorrelation in the residuals. As m 20 is relatively large (in this context), the
estimated values of the MA(1) term should not be too different from the theoreti-
cal value given by GEORGE C. TIAO (1972). The theoretical value -0.268 lies in the
two-sigma confidence interval of the estimated parameter -0.308.

2.2.3 Higher Order Moving Average Processes

In general, the moving average process of order q (MA(q)) can be written
as
(2.40) xt = + ut – 1 ut-1 – 2 ut-2 – ... – q ut-q
with q 0 and ut as a pure random process. Using the lag operator we get
2 q
(2.40') xt – = (1 – 1L – 2L – ... – qL )ut

= (L)ut .


From (2.40) we see that we already have a finite order Wold representation
with k = 0 for k > q. Thus, there are no problems of convergence, and
every finite MA(q) process is stationary, no matter what values are used
for j, j = 1, 2, ..., q.
For the expectation of (2.40) we immediately get E[xt] = . Thus, the
variance can be calculated as:
V[xt] = E[(xt – )2]
= E[(ut – 1 ut-1 – ... – q ut-q)2]
2
= E[( u 2 +
t
2
1 u 2 1 + ... +
t q u2 q – 2
t 1 utut-1 – ...
–2 q-1 q ut-q+1ut-q)] .
From this we obtain
2 2 2 2
V[xt] = (1 + 1 + 2 + ... + q ) .

For the covariances of order we can write
Cov[xt, xt+ ] = E[(xt – )(xt+ – )]
= E[(ut – 1ut-1 – ... – q ut-q)
(ut+ – 1 ut+ -1 – ... – q ut+ -q)]

= E[ut(ut+ – 1 ut+ -1 – ... – q ut+ -q)
– 1 ut-1(ut+ – 1 ut+ -1 – ... – q ut+ -q)

– q ut-q(ut+ – 1 ut+ -1 – ... – q ut+ -q)] .
Thus, for = 1, 2, ..., q we get
2
= 1: (1) = (– 1 + 1 2 + ... + q-1 q) ,
2
= 2: (2) = (– 2 + 1 3 + ... + q-2 q) ,
(2.41)

2
= q: (q) = – q ,
while we have ( ) = 0 for > q.
Consequently, all autocovariances and autocorrelations with orders
higher than the order of the process are zero. It is – at least theoretically –
possible to identify the order of an MA(q) process by using the autocorre-
logram.
It can be seen from (2.41) that there exists a system of non-linear equa-
tions for given (or estimated) second order moments that determines
(makes it possible to estimate) the parameters 1, ..., q. As we have al-


ready seen in the case of the MA(1) process, such non-linear equation sys-
tems have multiple solutions, i.e. there exist different values for 1, 2, ...
and q that all lead to the same autocorrelation structure. To get a unique
parameterisation, the invertibility condition is again required, i.e. it must
be possible to represent the MA(q) process as a stationary AR( ) process.
Starting from (2.40'), this implies that the inverse operator -1(L) can be
represented as an infinite series in the lag operator, where the sum of the
coefficients has to be bounded. Thus, the representation we get is an
AR( ) process
-1
ut = – + (L) xt
(1)

= – + c jx t j ,
(1) j 0

where
q
1 = (1 – 1L – ... – qL )( 1 + c1L + c2L2 + ... ),
and the parameters ci, i = 1, 2, ... are calculated by using again the method
of undetermined coefficients. Such a representation exists if all roots of
q
1– 1L – ... – qL = 0
are larger than one in absolute value.

Example 2.13
Let the following MA(2) process
xt = ut + 0.6 ut-1 – 0.1 ut-2
be given, with a variance of 1 given for the pure random process u. For the vari-
ance of x we get
V[xt] = (1 + 0.36 + 0.01) 1 = 1.37 .
Corresponding to (2.41) the covariances are
(1) = + 0.6 – 0.06 = 0.54
(2) = – 0.1 .
( ) = 0 for > 2
This leads to the autocorrelation coefficients (1) = 0.39 and (2) = -0.07. To
check whether the process is invertible, the quadratic equation
1 + 0.6 L 0.1 L2 = 0


has to be solved. As the two roots -1.36 and 7.36 are larger than 1 in absolute val-
ue, the invertibility condition is fulfilled, i.e. the MA(2) process can be written as
an AR( ) process
xt = (1 + 0.6 L – 0.1 L2) ut ,
1
ut = xt
1 0.6L 0.1L2
= (1 + c1 L + c2 L2 + c3 L3 + ) xt .
The unknowns ci, i = 1, 2, ..., can be determined by comparing the coefficients of
the polynomials in the following way:
1 = (1 + 0.6 L – 0.1 L2)(1 + c1 L + c2 L2 + c3 L3 + )
2 3
1 = 1 + c1 L + c2 L + c3 L +
+ 0.6 L + 0.6 c1 L + 0.6 c2 L3 +
2

0.1 L2 0.1 c1 L3
It holds that
c1 + 0.6 = 0 c1 = 0.60,
c2 + 0.6 c1 – 0.1 = 0 c2 = 0.46,
c3 + 0.6 c2 – 0.1 c1 = 0 c3 = 0.34,
c4 + 0.6 c3 – 0.1 c2 = 0 c4 = 0.25,
.
Thus, we get the following AR( ) representation
xt – 0.6 xt-1 + 0.46 xt-2 – 0.34 xt-3 + 0.25 xt-4 = ut .
Similarly to the MA(1) process, the partial autocorrelation function of the MA(q)
process does not break off. As long as the order q is finite, the MA(q) process is
stationary whatever its parameters are. If the order tends towards infinity, howev-
er, for the process to be stationary the series of the coefficients has to converge
just like in the Wold representation.

2.3 Mixed Processes

If we take a look at the two different functions that can be used to identify
autoregressive and moving average processes, we see from Table 2.1 that
the situation in which neither of them breaks off can only arise if there is
an MA( ) process that can be inverted to an AR( ) process, i.e. if the
Wold representation of an AR( ) process corresponds to an MA( ) pro-
cess. However, as pure AR or MA representations, these processes cannot

2.3 Mixed Processes 69

be used for empirical modelling because they can only be characterised by
means of infinitely many parameters. After all, according to the principle
of parsimony, the number of estimated parameters should be as small as
possible when applying time series methods.
In the following, we introduce processes which contain both an auto-
regressive (AR) term of finite order p and a moving average (MA) term of
finite order q. Hence, these mixed processes are denoted as ARMA(p,q)
processes. They enable us to describe processes in which neither the auto-
correlation nor the partial autocorrelation function breaks off after a finite
number of lags. Again, we start with the simplest case, the ARMA(1,1)
process, and consider the general case afterwards.

Table 2.1: Characteristics of the Autocorrelation and the Partial
Autocorrelation Functions of AR and MA Processes

Partial Autocorrelation
Autocorrelation Function
Function

MA(q) breaks off with q does not break off

AR(p) does not break off breaks off with p

2.3.1 ARMA(1,1) Processes

An ARMA(1,1) process can be written as follows,
(2.42) xt = + xt-1 + ut – ut-1 ,
or, by using the lag operator
(2.42') (1 – L) xt = + (1 – L) ut ,
where ut is a pure random process. To get the Wold representation of an
ARMA(1,1) process, we solve (2.42') for xt,
1 L
xt = + ut .
1 1 L
It is obvious that must hold, because otherwise xt would be a pure
random process fluctuating around the mean = /(1 – ). The j, j = 0, 1,
2, ..., can be determined as follows:


1 L 2 3
= 0 + 1L + 2L + 3L + …
1 L
2 3
1 – L = (1 – L)( 0 + 1L + 2L + 3L + …)
2 3
1– L = 0 + 1L + 2L + 3L + …
2 3
– 0 L – 1L – 2L – … .
Comparing the coefficients of the two lag polynomials we get
L0: 0 = 1
L1: 1 – 0 = – 1 = –
L2: 2 – 1 = 0 2 = ( – )
L3: 3 – 2 = 0 3 = 2
( – )

Lj: j – j-1 = 0 j = j-1
( – ).
The j, j 2 can be determined from the linear homogeneous difference
equation
j – j-1 =0
with 1 = – as initial condition. The j converge towards zero if and
only if | | < 1. This corresponds to the stability condition of the AR term.
Thus, the ARMA(1,1) process is stationary if, with stochastic initial condi-
tions, it has a stable AR(1) term. The Wold representation is
2
(2.43) xt = + ut + ( – ) ut-1 + ( – ) ut-2 + ( – ) ut-3 + ... .
1
Thus, the ARMA(1,1) process can be written as an MA( ) process.
To invert the MA(1) part, | | < 1 must hold. Starting from (2.42') leads
to
1 L
ut = + xt .
1 1 L
If 1/(1 – L) is developed into a geometric series we get
2 2
ut = + (1 – L)(1 + L + L + ... ) xt
1

2
= + xt + ( – ) xt-1 + ( – ) xt-2 + ( – ) xt-3 + ... .
1


This proves to be an AR( ) representation. It shows that the combination
of an AR(1) and an MA(1) term leads to a process with both MA( ) and
AR( ) representation if the AR term is stable and the MA term invertible.
We obtain the first and second order moments of the stationary process
in (2.42) as follows:
E[xt] = E[ + xt-1 + ut – ut-1]
= + E[xt-1] .
Due to E[xt] = E[xt-1] = , we get

= ,
1
i.e. the expectation is the same as in an AR(1) process.
If we set = 0 without loss of generality, the expectation is zero. The
autocovariance of order 0 can then be written as
(2.44) E[xt- xt] = E[xt- ( xt-1 + ut – ut-1)],
which leads to
(0) = (1) + E[xtut] – E[xtut-1]
2 2
for = 0. Due to (2.43), E[xtut] = and E[xtut-1] = ( – ) . Thus, we can
write
2
(2.45) (0) = (1) + (1 – ( – )) .
(2.44) leads to
(1) = (0) + E[xt-1ut] – E[xt-1ut-1]
for = 1. Because of (2.43) this can be written as
2
(2.46) (1) = (0) – .
If we insert (2.46) in (2.45) and solve for (0), the resulting variance of the
ARMA(1,1) process is
2
1 2 2
(2.47) (0) = 2
.
1
Inserting this into (2.46), we get
( )(1 ) 2
(2.48) (1) = 2
1


for the first order autocovariance. For 2, (2.44) results in the autoco-
variances
(2.49) ( ) = ( -1)
and the autocorrelations
(2.50) ( ) = ( -1) .
This results in the same difference equation as in an AR(1) process but,
however, with the different initial condition
( )(1 )
(1) = 2
.
1 2
The first order autocorrelation coefficient is influenced by the MA term,
while the higher order autocorrelation coefficients develop in the same
way as in an AR(1) process.
If the process is stable and invertible, i.e. for | | < 1 and | | < 1, the sign
of (1) is determined by the sign of ( – ) because of (1 + 2 – 2 ) > 0
and (1 – ) > 0. Moreover, it follows from (2.49) that the autocorrelation
function – as in the AR(1) process – is monotonic for > 0 and oscillating
for < 0. Due to | | < 1 with increasing, the autocorrelation function also
decreases in absolute value.
Thus, the following typical autocorrelation structures are possible:
(i) > 0 and > : The autocorrelation function is always positive.
(ii) < 0 and < : The autocorrelation function oscillates; the initial
condition (1) is negative.
(iii) > 0 and < : The autocorrelation function is negative from (1)
onwards.
(iv) < 0 and > : The autocorrelation function oscillates; the initial
condition (1) is positive.
Figure 2.9 shows the development of the corresponding autocorrelation
functions up to = 20 for the parameter values , {0.8, 0.5, -0.5, -0.8}
in which, of course, must always hold, as otherwise the ARMA(1,1)
process degenerates to a pure random process.
For the partial autocorrelation function we get
( )(1 )
11 = (1) = 2
,
1 2


1
0.8
0.6
0.4
0.2
0
-0.2 5 10 15 20
-0.4
-0.6
-0.8
-1

1
0.8
0.6
0.4
0.2
0
-0.2 5 10 15 20
-0.4
-0.6
-0.8
-1

1
0.8
0.6
0.4
0.2
0
-0.2 5 10 15 20
-0.4
-0.6
-0.8
-1

1
0.8
0.6
0.4
0.2
0
-0.2 5 10 15 20
-0.4
-0.6
-0.8
-1

Figure 2.9: Theoretical autocorrelation functions of ARMA(1,1) processes


1 (1)
(1) (2) (2) (1)2 (1)( (1))
22 = = = 2
,
1 (1) 1 (1)2 1 (1)
(1) 1
because of (2) = (1),
1 (1) (1) 1 (1) (1)
(1) 1 (2) (1) 1 (1)
2
(2) (1) (3) (1) (1) (1)
33 = =
1 (1) (2) 1 2 (1)3 (1) 2 (2 2
)
(1) 1 (1)
(2) (1) 1

(1)( (1)) 2
= , etc.
1 2 (1)3 (1) 2 (2 2
)
Thus, the ARMA(1,1) process is a stationary stochastic process where nei-
ther the autocorrelation nor the partial autocorrelation function breaks off.
The following example shows how, due to measurement error, an
AR(1)-process becomes an ARMA(1,1) process.

Example 2.14
The ‘true’ variable x t is generated by a stationary AR(1) process,

(E2.8) xt = xt 1 + ut ,

but it can only be measured with an error vt, i.e. for the observed variable xt it
holds that
(E2.9) xt = x t + vt ,

where vt is a pure random process uncorrelated with the random process ut. (The
same model was used in Example 2.3 but with a different interpretation.) If we
transform (E2.8) to
ut
xt =
1 L
and insert it into (E2.9) we get
(1 – L) xt = ut + vt – vt-1 .


For the combined error term t = ut + vt – vt-1 we get
2 2 2
(0) = u + (1 + ) v

2
(1) = - v

( ) = 0 for 2,
or
2
v
(1) = 2 2 2
, ( ) = 0 for 2.
u (1 ) v

Thus, the observable variable xt follows an ARMA(1,1) process,
(1 – L) xt = (1 – L) t ,
where can be calculated by means of (1) and t is a pure random pro-
cess. (See also the corresponding results in Section 2.2.1.)

2.3.2 ARMA(p,q) Processes

The general autoregressive moving average process with AR order p and
MA order q can be written as
(2.51) xt = + 1 xt-1 + ... + p xt-p + ut – 1 ut-1 – ... – q ut-q ,
with ut being a pure random process and p 0 and q 0 having to hold.
Using the lag operator, we can write
p q
(2.51') (1 – 1L – ... – pL ) xt = + (1 – 1L – ... – qL ) ut ,
or
(2.51'') (L) xt = + (L) ut .
As factors that are common in both polynomials can be reduced, (L) and
(L) cannot have identical roots. The process is stationary if – with sto-
chastic initial conditions – the stability conditions of the AR term are ful-
filled, i.e. if (L) only has roots that are larger than 1 in absolute value.
Then we can derive the Wold representation for which
(L) = (L)(1 + 1L + 2 L2 + ... )
must hold. Again, the j, j = 1, 2, ..., can be calculated by comparing the
coefficients. If, likewise, all roots of (L) are larger than 1 in absolute val-
ue, the ARMA(p,q) process is also invertible.
A stationary and invertible ARMA(p,q) process may either be repre-
sented as an AR( ) or as an MA( ) process. Thus, neither its autocorrela-


tion nor its partial autocorrelation function breaks off. In short, it is possi-
ble to generate stationary stochastic processes with infinite AR and MA
orders by using only a finite number of parameters.
Under the assumption of stationarity, (2.51) directly results in the con-
stant mean

E[xt] = = .
1 1 p

If, without loss of generality, we set = 0 and thus also = 0, we get the
following relation for the autocovariances:
( ) = E[xt- xt]
= E[xt- ( 1 xt-1 + ... + p xt-p + ut – 1 ut-1 – ... – q ut-q)] .
This relation can also be written as
( ) = 1 ( -1) + 2 ( -2) + ... + p ( -p)
+ E[xtut] – 1 E[xtut-1] – ... – q E[xtut-q] .
Due to the Wold representation, the covariances between xt- and ut-i, i = 0,
..., q, are zero for > q, i.e. the autocovariances for > q and > p are gen-
erated by the difference equation of an AR(p) process,
( ) – 1 ( -1) – 2 ( -2) – ... – p ( -p) = 0 for > q >p
whereas the first q autocovariances are also influenced by the MA part.
Normalisation with (0) leads to exactly the same results for the autocorre-
lations.
If the orders p and q are given and the distribution of the white noise
process ut is known, the parameters of an ARMA(p,q) process can be esti-
mated consistently by using maximum likelihood methods. These esti-
mates are also asymptotically efficient. If there is no such programme
available, it is possible to estimate the parameters consistently with least
squares. As every invertible ARMA(p,q) process is equivalent to an
AR( ) process, first of all an AR(k) process is estimated with k sufficient-
ly larger than p. From this, one can get estimates of the non-observable re-
siduals ût. By employing these residuals, the ARMA(p,q) process can be
estimated with the least squares method,
xt = + 1 xt-1 + ... + p xt-p – 1 ût-1 – ... – q ût-q + vt .
This approach can also be used if p and q are unknown. These orders can,
for example, be determined by using the information criteria shown in Sec-
tion 2.1.5.


Percent
8
7
6
5
4
3
2
1
0 year
1994 1996 1998 2000 2002
a) New York three month money market rate,
1994 – 2003
1
0.8
0.6
0.4
0.2
0
-0.2 5 10 15 20
-0.4
-0.6
-0.8 b) Autocorrelation (__) and partial ( )
-1 autocorrelation functions of the first
differences with confidence intervals
ˆ

1
0.8
0.6
0.4
0.2
0
-0.2 5 10 15 20
-0.4
-0.6
-0.8 c) Autocorrelation function of the residuals
-1 of the estimated ARMA(1,1)-process

Figure 2.10: Three month money market rate in New York, 1994 – 2003


Example 2.15
Figure 2.10 shows the development of the US three month money market rate
(USR) as well as the estimated autocorrelation and partial autocorrelation function
of the first differences of this time series for the period from March 1994 to Au-
gust 2003 (114 observations). Both functions do not show a clear break-off behav-
iour. Therefore, the following ARMA(1,1) model has been estimated for this time
series:
USRt = – 0.006 + 0.831 USRt-1 + ût – 0.457 ût-1,.
(-0.73) (10.91) (-3.57)
R 2 = 0.351, SE = 0.166, Q(10) = 7.897 (p = 0.639).
The AR(1) as well as the MA(1) terms are different from zero and from one at any
usual significance level. The autocorrelogram of the estimated residuals, which is
also given in Figure 2.10, as well as the Ljung-Box Q statistic, which is calculated
for this model with 12 autocorrelation coefficients (i.e. with 10 degrees of free-
dom), do not provide any evidence of a higher order process.

2.4 Forecasting

As mentioned in the introduction, in the 1970’s, one of the reasons for the
broad acceptance of time series analysis using the Box-Jenkins approach
was the fact that forecasts with this comparably simple method often out-
performed forecasts generated by large econometric models. In the follow-
ing, we show how ARMA models can be used for making forecasts about
the future development of time series. In doing so, we assume that all ob-
servations of the time series up to time t are known.

2.4.1 Forecasts with Minimal Mean Squared Errors

We want to solve the problem of making a -step ahead forecast for xt with
a linear prediction function, given a stationary and/or invertible data gen-
erating process.
ˆ ˆ
Let x t ( ) be such a prediction function for xt+ . Thus, x t ( ) is a random
variable for given t and . As all stationary ARMA processes have a Wold
representation, we assume the existence of such a representation without
loss of generality. Thus,
2
xt = + j ut j , 0 = 1, j < ,
j 0 j 0

2.4 Forecasting 79

where ut is a pure random process with the usual properties E[ut] = 0,
2
for t s
E[utus] = .
0 for t s

Therefore, it also holds that

(2.52) xt+ = + j ut j , = 1, 2, ... .
j 0

For a linear prediction function with the information given up to time t, we
assume the following representation

(2.53) ˆ
xt ( ) = + k ut k , = 1, 2, ... ,
k 0

where the k , k = 0, 1, 2, ..., = 1, 2, ..., are unknown. The forecast error
ˆ
of a -step forecast is ft( ) = xt+ – x t ( ), = 1, 2, ..., . In order to make a
good forecast, these errors should be small. The expected quadratic fore-
cast error E[(xt+ – x t ( ))2], which should be minimised, is used as the cri-
ˆ
terion to determine the unknowns k . Taking into account (2.52) and
(2.53) we can write
2
2
E [ f ( )] = E
t j ut j k ut k
j 0 k 0

2

= E ut 1u t 1 1u t 1 ( k k )u t k
.
k 0

From this it follows that
2
2 2 2 2 2
(2.54) E [ f ( )] =
t 1 1 1 k k .
k 0

The variance of the forecast error reaches its minimum if we set k = +k
for k = 0, 1, 2, ..., . Thus, we get the optimal linear prediction function for
a -step ahead forecast from (2.53), as

(2.55) ˆ
xt ( ) = + k ut k , = 1, 2, ... .
k 0


For the conditional expectation of ut+s, given ut, ut-1, …, it holds that
ut s for s 0
E[ut+s|ut, ut-1, ...] = .
0 for s 0
Thus, we get the conditional expectation of xt+ , because of (2.52), as

E[xt+ |ut, ut-1, ...] = + k ut k .
k 0

Due to (2.55), the conditional expectation of xt+ , with all information
available at time t given, is identical to the optimal prediction function.
This leads to the following result: The conditional expectation of xt+ , with
all information up to time t given, provides the -step forecast with mini-
mal mean squared prediction error.
With (2.52) and (2.55) the -step forecast error can be written as
ˆ
(2.56) ft( ) = xt+ – x t ( ) = ut+ + 1ut+ -1 + 2ut+ -2 + ... + -1ut+1

with
E[ft( )|ut, ut-1, ...] = E[ft( )] = 0 .
From these results we can immediately draw some conclusions:
1. Best linear unbiased predictions (BLUP) of stationary ARMA process-
es are given by the conditional expectation for xt+ , = 1,2, …
ˆ
x t ( ) = E[xt+ |xt, xt-1, ...] = Et[xt+ ] .
2. For the one-step forecast errors ( = 1), ft(1) = ut+1, we get
E[ft(1)] = E[ut+1] = 0, and
2
for t s
E[ft(1)fs(1)] = E[ut+1us+1] = .
0 for t s

The one-step forecast errors are a pure random process; they are identi-
cal with the residuals of the data generating process. If the one-step
prediction errors were correlated, the prediction could be improved by
using the information contained in the prediction errors. In such a case,
ˆ
however, x t (1) would not be an optimal forecast.
3. For the -step forecast errors ( > 1) we get
ft( ) = ut+ + 1ut+ -1 + 2ut+ -2 + ... + -1ut+1 ,

2.4 Forecasting 81

i.e. they follow a MA( -1) process with E[ft( )] = 0 and the variance
2 2 2
(2.57) V[ft( )] = 1 1 1 .

This variance can be used for constructing confidence intervals for -
step forecasts. However, these intervals are too narrow for practical ap-
plications because they do not take into account the uncertainty in the
estimation of the parameters i, i = 1, 2, ..., -1.
4. It follows from (2.57) that the forecast error variance increases mono-
tonically with increasing forecast horizon :
V[ft( )] V[ft( -1)] .
5. Due to (2.57) we get for the limit
2 2 2 2 2
lim V[ft( )] = lim 1 1 1 = j = V[xt] ,
j 0

i.e. the variance of the -step forecast error is not larger than the vari-
ance of the underlying process.
6. The following variance decomposition follows from (2.55) and (2.56):
(2.58) ˆ
V[xt+ ] = V[ x t ( )] + V[ft( )] .
7. Furthermore,

ˆ
lim x t ( ) = lim k ut k = = E[xt] ,
k 0

i.e. for increasing forecast horizons, the forecasts converge to the (un-
conditional) mean of the series.
The concept of ‘weak’ rational expectations whose information set is re-
stricted to the current and past values of a variable exactly corresponds to
the optimal prediction approach used here.

2.4.2 Forecasts of ARMA(p,q) Processes

The Wold decomposition employed in the previous section has advantages
when it comes to the derivation of theoretical results, but it is not practical-
ly useful for forecasting. Thus, in the following, we will discuss forecasts
directly using AR, MA, or ARMA representations.


Forecasts with a Stationary AR(1) Process

For this process, it holds that
xt = + xt-1 + ut ,
with | | < 1. The optimal -step forecast is the conditional mean of xt+ , i.e.
Et[xt+ ] = Et[ + xt+ -1 + ut+ ] = + Et[xt+ -1] .
Due to the first conclusion, we get the following first order difference
equation for the prediction function
ˆ
xt ( ) = + ˆ
x t ( -1) ,
which can be solved recursively:
ˆ
= 1: x t (1) = + ˆ
x t (0) = + xt
2
ˆ
= 2: x t (2) = + ˆ
x t (1) = + + xt

-1
ˆ
xt ( ) = (1 + + ... + ) + xt

1
ˆ
xt ( ) = + xt = + (xt – ).
1 1 1
As = /(1 – ) is the mean of a stationary AR(1) process,
ˆ
xt ( ) = + ˆ
(xt – ) with lim x t ( ) = ,

i.e., with increasing forecast horizon , the predicted values of an AR(1)
process converge geometrically to the unconditional mean of the pro-
cess. The convergence is monotonic if is positive, and oscillating if is
negative.
To calculate the -step prediction error, the Wold representation, i.e. the
MA( ) representation of the AR(1) process, can be used,
2 3
xt = + ut + ut-1 + ut-2 + ut-3 + ... .
Due to (2.56) and (2.57) we get the MA( -1) process
2 -1
ft( ) = ut+ + ut+ -1 + ut+ -2 + ... + ut+1
for the forecast error with the variance
2
2 2( 1) 2 1 2
V[ft( )] = 1 = 2
.
1

2.4 Forecasting 83

With increasing forecast horizons, it follows that
2
lim V[ft( )] = 2
= V[xt] ,
1
i.e. the prediction error variance converges to the variance of the AR(1)
process.

Forecasts with Stationary AR(p) Processes

Starting with the representation
xt = + 1 xt-1 + 2 xt-2 + ... + p xt-p + ut ,
the conditional mean of xt+ is given by
Et[xt+ ] = + 1 Et[xt+ -1] + ... + p Et[xt+ -p] .
Here,
ˆ
x t (s) for s 0
Et[xt+s] = .
x t s for s 0

Thus, the above difference equation can be solved recursively:
ˆ
= 1: x t (1) = + 1 xt + 2 xt-1 + ...+ p xt+1-p

ˆ
= 2: x t (2) = + 1 ˆ
x t (1) + 2 xt + ... + p xt+2-p , etc.

Forecasts with an Invertible MA(1) Process

For this process, it holds that
xt = + ut – ut-1
with | | < 1. The conditional mean of xt+ is
Et[xt+ ] = + Et[ut+ ] – Et[ut+ -1] .
For = 1, this leads to
(2.59) ˆ
x t (1) = – ut ,
and for 2, we get
ˆ
xt ( ) = ,
i.e. the unconditional mean is the optimal forecast of xt+ , = 2, 3, ..., . For
the -step prediction errors and their variances we get:


2
ft(1) = ut+1, V[ft(1)] =
2 2
ft(2) = ut+2 – ut+1, V[ft(2)] = (1 + )

2 2
ft( ) = ut+ – ut+ -1, V[ft( )] = (1 + ) .
To be able to perform the one-step forecasts (2.59), the unobservable vari-
able u has to be expressed as a function of the observable variable x. To do
this, it must be taken into account that for s t, the one-step forecast errors
can be written as
(2.60) ˆ
us = xs – x s 1 (1).
For t = 0, we get from (2.59)
ˆ
x0 (1) = – u0
with the non-observable but fixed u0. Taking (2.60) into account, we get
for t = 1
ˆ
x1 (1) = – u1 = ˆ
– (x1 – x 0 (1))
= – x1 + ( – u0)
2
= (1 + ) – x1 – u0 .
Correspondingly, we get for t = 2
ˆ
x 2 (1) = – u2 = – ˆ
(x2 – x1 (1))
2
= – x2 + ( (1 + ) – x1 – u0)
2 2 3
= (1 + + ) – x2 – x1 – u0 .
If we continue this procedure, the so-called backcasting, we finally arrive
at a representation of the one-step prediction which – except for u0 – con-
sists only of observable terms,
ˆ
x t (1) = (1 + + ... + t) – xt – 2
xt-1 – ... – t
x1 – t+1
u0 .
Due to the invertibility of the MA(1) process, i.e. for | | < 1, the impact of
the unknown initial value u0 finally disappears.
Similarly, one can show that, after q forecast steps, the optimal forecasts
of invertible MA(q) processes, q > 1 are equal to the unconditional mean
of the process and that the variance of the forecast errors is equal to the
variance of the underlying process. The forecasts in observable terms are
represented similarly to those of the MA(1) process.

2.4 Forecasting 85

Forecasts with ARMA(p,q) Processes

Forecasts for these processes result from combining the approaches of pure
AR and MA processes. Thus, the one-step ahead forecast for a stationary
and invertible ARMA(1,1) process is given by
ˆ
x t (1) = + xt – ut.
Starting with t = 0 and taking (2.60) into account, forecasts are successive-
ly generated by backcasting. We first get
ˆ
x0 (1) = + x0 – u0,
where x0 and u0 are assumed to be any fixed numbers. For t = 1 we get
ˆ
x1 (1) = + x1 – u1 = + x1 – ˆ
(x1 – x 0 (1))
2
= (1 + ) + ( – ) x1 + x0 – u0 ,
which finally leads to
(2.61) ˆ
x t (1) = (1 + + ... + t) + ( – ) xt + ( – ) xt-1 + ...
t-1 t t+1
+ ( – ) x1 + x0 – u0 .
Due to the invertibility condition, i.e. for | | < 1, the one-step forecast for
large values of t does no longer depend on the unknown initial values x0
and u0.
For the -step forecast, = 2, 3, ..., we get
ˆ
x t (2) = + ˆ
x t (1)
ˆ
x t (3) = + ˆ
x t (2)

Using (2.61), these forecasts can be calculated recursively.

2.4.3 Evaluation of Forecasts

Forecasts can be evaluated ex post, i.e. when the realised values are avail-
able. There are many kinds of measures to do this. Quite often, only graphs
and/or scatter diagrams of the predicted values and the corresponding ob-
served values of a time series are plotted. Intuitively, a forecast is good’ if
the predicted values describe the development of the series in the graphs
relatively well or if the points in the scatter diagram are concentrated
around the angle bisecting line in the first and/or third quadrant. Such intu-


itive arguments are, however, not founded on the above-mentioned consid-
erations on optimal predictions. For example, as (2.59) shows, the optimal
one-step forecast of a MA(1) process is a pure random process. This im-
plies that the graphs compare two quite different processes. Conclusion 6
given above states that the following decomposition holds for the vari-
ances of the data generating processes, the forecasts and the forecast er-
rors,
ˆ
V[xt+ ] = V[ x t ( )] + V[ft( )] .
Thus, it is obvious that predicted and realised values are generally generat-
ed by different processes.
As a result, a measure for the predictability of stationary processes can
be developed. It is defined as follows,
ˆ
V[x t ( )] V[f t ( )]
(2.62) P( )2 = = 1 – ,
V[x t ] V[x t ]
with 0 P( )2 1. At the same time, P( )2 is the correlation coefficient be-
tween the predicted and the realised values of x. The optimal forecast of a
pure random process with mean zero is x t ( ) = 0, i.e. P( )2 = 0. Such a
ˆ
process cannot be predicted. On the other hand, for the one-step forecast of
a MA(1) process, we can write
2 2 2
P(1)2 = 2 2
= 2
> 0.
(1 ) 1
However, the decomposition (2.58), theoretically valid for optimal fore-
casts, does not hold for actual (empirical) forecasts, even if they are gener-
ated by using (estimated) ARMA processes. This is due to the fact that
forecast errors are hardly ever totally uncorrelated with the forecasts.
Therefore, the value of P( )2 might even become negative for bad’ fore-
casts.
JACOB MINCER and VICTOR ZARNOWITZ (1969) made the following
suggestion to check the consistency of forecasts. By using OLS the follow-
ing regression equation is estimated
(2.63) xt+ ˆ
= a0 + a1 x t ( ) + t+ .
It is tested either individually with t tests or commonly with a F test
whether a0 = 0 and a1 = 1. If this is fulfilled, the forecasts are said to be
consistent. However, such a regression produces consistent estimates of
ˆ
the parameters if and only if x t ( ) and t+ are asymptotically uncorrelated.

2.4 Forecasting 87

Moreover, to get consistent estimates of the variances, which is necessary
for the validity of the test results, the residuals have to be pure random
processes. Even under the null hypothesis of optimal forecasts, this only
holds for one-step predictions. Thus, the usual F and t tests can only be
used for = 1. For > 1, the MA( -1) process of the forecast errors has to
be taken into account when the variances are estimated. A procedure for
such situations combines Ordinary Least Squares for the estimation of the
parameters and Generalised Least Squares for the estimation of the vari-
ances, as proposed by BRYAN W. BROWN and SHLOMO MAITAL (1981).
JINOOK JEONG and GANGADHARRAO S. MADDALA (1991) have pointed
out another problem which is related to these tests. Even rational forecasts
are usually not without errors; they contain measurement errors. This im-
plies, however, that (2.63) cannot be estimated consistently with OLS; an
instrumental variables estimator must be used. An alternative to the esti-
mation of (2.63) is therefore to estimate a univariate MA( -1) model for
the forecast errors of a -step prediction,
ˆ
f t( ) = a0 + ut + a1 ut-1 + a2 ut-2 + ... + a -1 ut- +1 ,
and to check the null hypothesis H0: a0 = 0 and whether the estimated re-
siduals ût are white noise.
On the other hand, simple descriptive measures, which are often em-
ployed to evaluate the performance of forecasts, are based on the average
values of the forecast errors over the forecast horizon. The simple arithme-
tic mean indicates whether the values of the variable are – on average –
over- or underestimated. However, the disadvantage of this measure is that
large over- and underestimates cancel each other out. The mean absolute
error is often used to avoid this effect. Starting the forecasts from a fixed
point of time, t0, and assuming that realisations are available up to t0+m,
we get
m
1
MAE( ) = f t0 j ( ) , = 1, 2, ... .
m 1 j 0

Every forecast error gets the same weight in this measure. The root mean
square error is often used to give particularly large errors a stronger
weight:
m
1
RMSE( ) = f t2 j ( ) , = 1, 2, ... .
m 1 j 0
0

These measures are not normalised, i.e. their size depends on the scale of
the data.


The inequality measure proposed by HENRY THEIL (1961) avoids this
problem by comparing the actual forecasts with so-called naïve forecasts,
i.e. the realised values of the last available observation,
m
f t2 j ( )
0
j 0
U( ) = m
, = 1, 2, ... .
(x t 0 j x t0 j )2
j 0

ˆ
If U( ) = 1, the forecast is as good as the naïve forecast, x t ( ) = xt. For
U( ) < 1 the forecasts perform better than the naïve one. MAE, RMSE and
Theil’s U all become zero if predicted and realised values are identical
over the whole forecast horizon.

Example 2.16
All these measures can also be applied to forecasts which are not generated by
ARMA models, as, for example, the forecasts of the Council of Economic Experts
or the Association of German Economic Research Institutes. Since the end of the
1960’s, both institutions have published forecasts of the German economic devel-
opment for the following year, the institutes usually in October and the Council at
the end of November. HANNS MARTIN HAGEN and GEBHARD KIRCHGÄSSNER
(1996) investigated the annual forecasts of the growth rates of GNP for the period
from 1970 to 1995 as well as for the sub-periods from 1970 to 1982 and from
1983 to 1995. These periods correspond to the social-liberal government of SPD
and FDP and the conservative-liberal government of CDU/CSU and FDP.
The results are given in Table 2.2. Besides the criteria given above, the table al-
so indicates the square of the correlation coefficient between realised and predict-
ed values (R2), the estimated regression coefficient â1 of the test equation (2.63) as
well as the mean error (ME). According to almost all criteria, the forecasts of the
Council outperform those of the institutes. This was to be expected, as the Coun-
cil’s forecasts are produced slightly later, at a time when more information is
available. It holds for the forecasts of both institutions that the mean absolute er-
ror, the root mean squared error as well as Theil's U are smaller in the second pe-
riod compared to the first one. This is some evidence that the forecasts might have
improved over time. On the other hand, the correlation coefficient between pre-
dicted and realised values has also become smaller. This indicates a deterioration
of the forecasts. It has to be taken into account that the variance of the variable to
be predicted was considerably smaller in the second period as compared to the
first one. Thus, the smaller errors do not necessarily indicate improvements of the
forecasts. It is also interesting to note that on average the forecast errors of both
institutions were negative in the first and positive in the second sub-period. They
tended to overestimate the development in the period of the social-liberal coalition
and to underestimate it in the period of the conservative-liberal coalition.

2.5 The Relation between Econometric Models and ARMA Processes 89

Table 2.2: Forecasts of the Council of Economic Experts
and of the Economic Research Institutes

Period R2 RMSE MAE ME â1 U

1970 – 1995 0.369 1.838 1.346 -0.250* 1.005* 0.572
Institutes 1970 – 1982 0.429 2.291 1.654 -0.731 1.193* 0.625
1983 – 1995 0.399 1.229 1.038 0.231 1.081 0.457

1970 – 1995 0.502* 1.647* 1.171* -0.256 1.114 0.512*
Council of
Economic 1970 – 1982 0.599* 2.025* 1.477* -0.723* 1.354 0.552*
Experts
1983 – 1995 0.472* 1.150* 0.865* 0.212* 1.036* 0.428*

‘*’ denotes the ‘better’ of the two forecasts.

2.5 The Relation between Econometric Models and
ARMA Processes

The ARMA model-based forecasts discussed in the previous section are
unconditional forecasts. The only information that is used to generate the-
se forecasts is the information contained in the current and past values of
the time series. There is demand for such forecasts, and – as mentioned
above – one of the reasons for the development and the popularity of the
Box-Jenkins methodology presented in this chapter is that by applying the
above-mentioned approaches, these predictions perform – at least partly –
much better than forecasts generated by large scale econometric models.
Thus, the Box-Jenkins methodology seems to be a (possibly much better)
alternative to the traditional econometric methodology.
However, this perspective is rather restricted. On the one hand, condi-
tional rather than unconditional forecasts are required in many cases, for
example, in order to evaluate the effect of a tax reform on economic
growth. Such forecasts cannot be generated by using (only) univariate
models. On the other hand, and more importantly, the separation of the two
approaches is much less strict than it seems to be at first glance. As
ARNOLD ZELLNER and FRANZ C. PALM (1974) showed, linear dynamic
simultaneous equation systems as used in traditional econometrics can be
transformed into ARMA models. (Inversely, multivariate time series mod-
els as discussed in the next chapters can be transformed into traditional
econometric models.) The univariate ARMA models correspond to the fi-


nal equations of econometric models in the terminology of JAN TINBER-
GEN (1940).
Let us consider a very simple model. An exogenous, weakly stationary
variable x, as defined in (2.64b), has a current and lagged impact on the
dependent variable y, while the error term might be autocorrelated. Thus,
we get the model
(2.64a) yt = 1(L) xt + 2(L) u1,t ,
(2.64b) (L) xt = (L) u2,t ,
where 1(L) and 2(L) are lag polynomials of finite order. If we insert
(2.64b) in (2.64a), we get for y the univariate model
(2.64a') (L) yt = (L) vt
with
(L) vt := 1(L) (L) u2,t + 2(L) (L) u1,t .
As (L)vt is an MA process of finite order, we get a finite order ARMA
representation for y. It must be pointed out that the univariate representa-
tions of the two variables have the same finite order AR term.

References

Since the time when HERMAN WOLD developed the class of ARMA processes in
his dissertation and GEORGE E.P. BOX and GWILYM M. JENKINS (1970) popular-
ised and further developed this model class in the textbook mentioned above, there
have been quite a lot of textbooks dealing with these models at different technical
levels. An introduction focusing on empirical applications is, for example, to be
found in
ROBERT S. PINDYCK and DANIEL L. RUBINFELD, Econometric Models and Eco-
nomic Forecasts, McGraw-Hill, Boston et al., 4th edition 1998, Chapter 17f.
pp. 521 – 578,
PETER J. BROCKWELL and RICHARD A. DAVIS, Introduction to Time Series and
Forecasting, Springer, New York et al. 1996, as well as
TERENCE C. MILLS, Time Series Techniques for Economists, Cambridge Universi-
ty Press, Cambridge (England) 1990. Contrary to this,
PETER J. BROCKWELL and RICHARD A. DAVIS, Time Series: Theory and Methods,
Springer, New York et al. 1987,
give a rigorous presentation in probability theory. Along with the respective
proofs of the theorems, this textbook shows, however, many empirical examples.

References 91

Autoregressive processes for the residuals of an estimated regression equation
were used for the first time in econometrics by
DONALD COCHRANE and GUY H. ORCUTT, Application of Least Squares Regres-
sion to Relationships Containing Autocorrelated Error Terms, Journal of the
American Statistical Association 44 (1949), pp. 32 – 61.
The different information criteria to detect the order of an autoregressive process
are presented in
HIROTUGU AKAIKE, Fitting Autoregressive Models for Prediction, Annals of the
Institute of Statistical Mathematics AC-19 (1974), pp. 364 – 385,
HIROTUGU AKAIKE, A New Look at the Statistical Model Identification, IEEE
Transactions on Automatic Control 21 (1969), pp. 234 – 237,
GIDEON SCHWARZ, Estimating the Dimensions of a Model, Annals of Statistics 6
(1978), pp. 461 – 464, as well as in
EDWARD J. HANNAN and BARRY G. QUINN, The Determination of the Order of an
Autoregression, Journal of the Royal Statistical Society B 41 (1979), pp. 190
– 195.
The effect of temporal aggregation on the first differences of temporal averages
have first been investigated by
HOLBROOK WORKING, Note on the Correlation of First Differences of Averages in
a Random Chain, Econometrica 28 (1960), pp. 916 – 918
and later on, in more detail, by
GEORGE C. TIAO, Asymptotic Behaviour of Temporal Aggregates of Time Series,
Biometrika 59 (1972), pp. 525 – 531.
The approach to check the consistency of predictions was developed by
JACOB MINCER and VICTOR ZARNOWITZ, The Evaluation of Economic Forecasts,
in: J. MINCER (ed.), Economic Forecasts and Expectations, National Bureau
of Economic Research, New York 1969.
The use of MA processes of the forecast errors to estimate the variances of the es-
timated parameters was presented by
BRYAN W. BROWN and SHLOMO MAITAL, What Do Economists Know? An Em-
pirical Study of Experts’ Expectations, Econometrica 49 (1981), pp. 491 –
504.
The fact that measurement errors also play a role in rational forecasts and that,
therefore, instrumental variable estimators should be used, was indicated by
JINOOK JEONG and GANGADHARRAO S. MADDALA, Measurement Errors and Tests
for Rationality, Journal of Business and Economic Statistics 9 (1991), pp. 431
– 439.


These procedures have been applied to the common forecasts of the German eco-
nomic research institutes by
GEBHARD KIRCHGÄSSNER, Testing Weak Rationality of Forecasts with Different
Time Horizons, Journal of Forecasting 12 (1993), pp. 541 – 558.
Moreover, the forecasts of the German Council of Economic Experts as well as
those of the German Economic Research Institutes were investigated in
HANNS MARTIN HAGEN and GEBHARD KIRCHGÄSSNER, Interest Rate Based Fore-
casts of German Economic Growth: A Note, Weltwirtschaftliches Archiv 132
(1996), pp. 763 – 773.
The measure of inequality (Theil’s U) was proposed by
HENRY THEIL, Economic Forecasts and Policy, North-Holland, Amsterdam 1961.
An alternative measure is given in
HENRY THEIL, Applied Economic Forecasting, North-Holland, Amsterdam 1966.
Today, both measures are used in computer programmes. Quite generally, fore-
casts for time series data are discussed in
CLIVE W.J. GRANGER, Forecasting in Business and Economics, Academic Press,
2nd edition 1989.
On the evaluation of the predictive accuracy of forecasts see
FRANCIS X. DIEBOLD and ROBERTO S. MARIANO, Comparing Predictive Accuracy,
Journal of Business and Economic Statistics 13 (1995), pp. 253 – 263.
The relationship between time series models and econometric equation sys-
tems is analysed in
ARNOLD ZELLNER and FRANZ C. PALM, Time Series Analysis and Simultaneous
Equation Econometric Models, Journal of Econometrics 2 (1974), pp. 17 –
54.
See for this also
FRANZ C. PALM, Structural Econometric Modeling and Time Series Analysis: An
Integrated Approach, in: A. ZELLNER (ed.), Applied Time Series Analysis of
Economic Data, U.S. Department of Commerce, Economic Research Report
ER-S, Washington 1983, pp. 199 – 230.
The term final equation originates from
JAN TINBERGEN, Econometric Business Cycle Research, Review of Economic
Studies 7 (1940), pp. 73 – 90.
An introduction into the solution of difference equations is given in
WALTER ENDERS, Applied Econometric Time Series, 3rd edition, Wiley, Hoboken,
N.J. 2010, Chapter 1.

References 93

The permanent income hypothesis as a determinant of consumption expenditure
was developed by
MILTON FRIEDMAN, A Theory of the Consumption Function, Princeton University
Press, Princeton N.J. 1957.
The example of the estimated popularity function is given in
GEBHARD KIRCHGÄSSNER, Causality Testing of the Popularity Function: An Em-
pirical Investigation for the Federal Republic of Germany, 1971 – 1982, Pub-
lic Choice 45 (1985), pp. 155 – 173.

http://www.springer.com/978-3-642-33435-1

Introduction to modern time series analysis

More Related Content

What's hot (20)

Similar to Introduction to modern time series analysis (20)

More from Springer (20)

Introduction to modern time series analysis