SlideShare a Scribd company logo
2 Univariate Stationary Processes




As mentioned in the introduction, the publication of the textbook by
GEORGE E.P. BOX and GWILYM M. JENKINS in 1970 opened a new road to
the analysis of economic time series. This chapter presents the Box-Jen-
kins Approach, its different models and their basic properties in a rather
elementary and heuristic way. These models have become an indispensa-
ble tool for short-run forecasts. We first present the most important ap-
proaches for statistical modelling of time series. These are autoregressive
(AR) processes (Section 2.1) and moving average (MA) processes (Section
2.2), as well as a combination of both types, the so-called ARMA process-
es (Section 2.3). In Section 2.4 we show how this class of models can be
used for predicting the future development of a time series in an optimal
way. Finally, we conclude this chapter with some remarks on the relation
between the univariate time series models described in this chapter and the
simultaneous equations systems of traditional econometrics (Section 2.5).


2.1 Autoregressive Processes

We know autoregressive processes from traditional econometrics: Already
in 1949, DONALD COCHRANE and GUY H. ORCUTT used the first order au-
toregressive process for modelling the residuals of a regression equation.
We will start with this process, then treat the second order autoregressive
process and finally show some properties of autoregressive processes of an
arbitrary but finite order.


2.1.1 First Order Autoregressive Processes

Derivation of Wold’s Representation

A first order autoregressive process, an AR(1) process, can be written as
an inhomogeneous stochastic first order difference equation,
(2.1)                                xt =         +     xt-1 + ut,

G. Kirchgässner et al., Introduction to Modern Time Series Analysis, Springer Texts in Business   27
and Economics, DOI 10.1007/978-3-642-33436-8_2, © Springer-Verlag Berlin Heidelberg 2013
28      Univariate Stationary Processes

where the inhomogeneous part + ut consists of a constant term and a
pure random process ut. Let us assume that for t = t0 the initial value x t 0 is
given. By successive substitution in (2.1) we get
         x t0   1   =          +        x t 0 + u t0        1


         x t0   2   =          +        x t0   1   + u t0       2


                    =          +    ( +                 x t 0 + u t0 1 ) + u t0                         2

                                                    2
                    =          +           +            x t0 +             u t0       1    + u t0       2


         x t0   3   =          +        x t0   2   + u t0       3

                                                    2                3                      2
         x t0   3   =          +           +              +              x t0 +                 u t0    1        + u t0       2   + u t0   3




                                                    2                                 -1
         x t0       = (1 +                +             + … +                              ) +                   x t0
                                   -1                       -2
                          +             u t0   1    +               u t0   2      + … +                          u t0     1   + u t0 ,

or
                                                                                                  1
                                                                1                                            j
                        x t0       =               x t0 +                                                        u t0     j   .
                                                                1                               j 0


For t = t0 + , we get
                                                                               t t0               t t0 1
                                          t t0                  1                                                   j
(2.2)                   xt =                       x t0 +                                                               ut j .
                                                                    1                                  j 0


The development and thus the properties of this process are mainly deter-
mined by the assumptions on the initial condition x t 0 .
  The case of a fixed (deterministic) initial condition is given if x0 is as-
sumed to be a fixed (real) number, for example for t0 = 0, i.e. no random
variable. Then we can write:
                                                                           t                t 1
                                               t            1                                           j
                               xt =                x0 +                                                      ut j .
                                                            1                               j 0


This process consists of time dependent deterministic and stochastic parts.
Thus, it can never be weakly stationary, since first and second order mo-
2.1 Autoregressive Processes                29

ments are time dependent. It is, however, asymptotically stationary be-
cause the time dependence vanishes for t0       - .
   We can imagine the case of stochastic initial conditions as (2.1) being
generated along the whole time axis, i.e. - < t < . If we observe the
process only for positive values of t, the initial value x0 is a random varia-
ble which is generated by this process. Formally, the process with stochas-
tic initial conditions results from (2.2) if the solution of the homogeneous
difference equation has disappeared. This is only possible if | | < 1. There-
fore, in the following, we restrict to the interval –1 < < 1. If lim x t 0 is
                                                                                                   t0

bounded, (2.2) for t0           -       converges to
                                                                     j
(2.3)                           xt =                                     ut j .
                                             1             j 0


The time dependence has disappeared. According to Section 1.5, the AR(1)
process (2.1) has the Wold representation (2.3) with j = j and | | < 1.
This results in the convergence of
                            2         2j       1
                            j =          =       2
                                                   .
                       j 0        j 0      1
Thus, assuming stochastic initial conditions, the process (2.1) is weakly
stationary.

The Lag Operator

Equation (2.3) can also be derived from relation (2.1) by using the lag op-
erator defined in Section 1.3:
(2.1')                              (1 – L)xt =                  + ut .
If we solve for xt we get
                                                                 1
(2.4)                           xt =                   +                     ut .
                                             1    L        1             L
The expression 1/(1 – L) can formally be expanded to a geometric series,
                    1                                  2 2                   3 3
                                = 1 +            L +       L +               L + … .
                1       L
Thus, we get
                                        2                                                2
         xt = (1 +      L +                 L + …) + (1 +                     L +            L + …)ut
                                    2                                               2
           = (1 +           +           + …) + ut +                      ut-1+          ut-2 + … ,
30   Univariate Stationary Processes

and because of | | < 1
                                                                                                   j
                                        xt =                                                           ut j .
                                                     1                            j 0


The first term could have been derived immediately if we substituted the
value ‘1’ for L in the first term of (2.4). (See also relation (1.8) on p. 11).

Calculation of Moments

Due to representation (2.3), the first and second order moments can be cal-
culated. As E[ut] = 0 holds for all t, we get for the mean

                                                                              j
            E[xt] = E                                                             ut       j
                                        1                      j 0


                                                                    j
            E[xt] =                                                     E ut               j               =           =
                                1                        j 0                                                   1
i.e. the mean is constant. It is different from zero if and only if 0. Be-
cause of 1 – > 0, the sign of the mean is determined by the sign of . For
the variance we get
                                            2                                                              2
                                                                                   j
 V[xt] = E         xt                               = E                                 ut         j
                            1                                           j 0

                                            2
        = E[(ut + ut-1 +                        ut-2 + ... )2]
                            2                        4
        = E[ u 2 +
               t                u2 1 +
                                 t                        u2
                                                           t    2       + … + 2 utut-1 + 2 2utut-2 + … ]
             2          2           4
        =        (1 +       +           + ...),
because E[ut us] = 0 for t s and E[ut us] = 2 for t = s. Applying the sum-
mation formula for the geometric series, and because of | | < 1, we get the
constant variance
                                                                                       2
                                                  V[xt] =                                      2
                                                                                                       .
                                                                              1
The covariances can be calculated as follows:

            Cov [xt,xt- ] = E                             xt                                           xt
                                                                        1                                          1
2.1 Autoregressive Processes              31

                             = E[(ut +                 ut-1 + ... +                     ut- + ...)
                                                                        2
                                     (ut- +             u t- -1 +           u t- -2 + ...)]
                                                                                    -1
                             = E[(ut + ut-1 + ... +                                      ut- +1
                                                                                    2
                               + (ut- + u t- -1 +                                       u t- -2 + ...))
                                                                                2
                                     (ut- +                 u t- -1 +               u t- -2 + ...)]
                                                                                2
                             =       E[(ut- + ut- -1 +                              ut- -2 + ... )2] .
Thus, we get
                                                                                           2
                 Cov [xt,xt- ] =                  V[xt- ] =                                    2
                                                                                                   .
                                                                                    1
The autocovariances are only a function of the time difference and not of
time t, and we can write:
                                              2
(2.5)                ( ) =                         2
                                                        ,        = 0, 1, 2, ... .
                                      1
Therefore, the AR(1) process with | | < 1 and stochastic initial conditions
is weakly stationary.

An Alternative Method for the Calculation of Moments

Under the condition of weak stationarity, i.e. for | | < 1 and stochastic ini-
tial conditions, the mean of xt is constant. If we apply the expectation op-
erator on equation (2.1), we get:
            E[xt] = E[ +         xt-1 + ut] =                       +       E[xt-1] + E[ut] .
Because of E[ut] = 0 and E[xt] = E[xt-1] =                       for all t we can write

                             E[xt] =                    =                   .
                                                                1
If we consider the deviations from the mean,
                                     x t = xt –
and substitute this in relation (2.1), we get:
                      xt +       =        +            xt   1   +               + ut .
From this it follows that
32      Univariate Stationary Processes


                     xt =       +       ( – 1) +                xt    1   + ut

                          =     +          ( – 1) +                       xt   1   + ut
                                    1

(2.6)                               xt =        xt      1   + ut .

This is the AR(1) process belonging to (2.1) with E[ x t ] = 0.
  If we multiply equation (2.6) with x t                        for                0 and take expectations
we can write:
(2.7)              E[ x t x t ] =         E[ x t x t 1 ] + E[ x t ut] .
Because of (2.3) we get
                                                                2
                     xt       = ut- +      ut- -1 +                  ut- -2 + … .
This leads to
                                                    2
                                                            for                    0
(2.8)                     E[ x t ut] =                                               .
                                                0           for                    0

Because of the stationarity assumption and because of the (even) sym-
metry of the autocovariances, ( ) = (- ), equation (2.7) results in
                                                                                          2
                   = 0:     E[ x 2 ]
                                 t         =            E[ x t x t 1 ] +                         ,
or
                                                                                         2
                              (0)          =                (1)                     +        ,
                   = 1:     E[ x t x t 1 ] =            E[ x 2 1 ],
                                                             t

or
                              (1)          =            (0) .
This leads to the variance of the AR(1) process
                                                            2
                                        (0) =                    2
                                                                      .
                                                    1
For       1 (2.7) implies
                              (1) =        (0)
                                                                 2
                              (2) =        (1) =                      (0)
2.1 Autoregressive Processes   33

                                                               3
                           (3) =             (2) =                 (0)


                           ( ) =             ( -1) =                (0) .
Thus, the covariances can be calculated from the linear homogeneous first
order difference equation
                                  ( )–               ( -1) = 0
                                 2               2
with the initial value (0) =         /(1 –           ).

The Autocorrelogram

Because of ( ) = ( )/ (0), the autocorrelation function (the autocorrelo-
gram) of the AR(1) process is
(2.9)                         ( ) =          ,            = 1, 2, ... .
This function converges geometrically to zero for        , and its infinite
sum equals 1/(1 – ) since | | < 1. This convergence is monotone for posi-
tive and oscillating for negative values of .

Example 2.1

For = 0 and          {0.9, 0.5, -0.9}, Figures 2.1 to 2.3 each present one realisation
of the corresponding AR(1) process with T = 240 observations. To generate these
series, we used realisations of normally distributed pure random processes with
mean zero and variance one. We always dropped the first 60 observations to elim-
inate the dependence of the initial values.
   The realisation for = 0.9, presented in Figure 2.1, is relatively smooth. This is
to be expected given the theoretical autocorrelation function because random vari-
ables with a considerable distance between each other still have high positive cor-
relations.
   The development of the realisation in Figure 2.2 with = 0.5 is much less sys-
tematic. The geometric decrease of the theoretical autocorrelation function is ra-
ther fast. The fourth order autocorrelation coefficient is only 0.0625.
   Contrary to this, the realisation of the AR(1) process with = -0.9, presented in
Figure 2.3, follows a well pronounced zigzag course with, however, alternating posi-
tive and negative amplitudes. This is consistent with the theoretical autocorrelation
function indicating that all random variables with even-numbered distance are posi-
tively correlated and those with odd-numbered distance negatively correlated.
34   Univariate Stationary Processes




                      xt
                7.5


                 5


                2.5


                 0                                                             t

               -2.5


                 -5


               -7.5                        a) Realisation


                 1

               0.8

               0.6

               0.4

               0.2

                 0
                                  5            10            15           20
               -0.2

               -0.4        b) Theoretical autocorrelation function

                  ˆ
                 1

               0.8

               0.6

               0.4

               0.2

                 0
                                  5            10            15           20
               -0.2

               -0.4
                            c) Estimated autocorrelation function
                               with confidence intervals



                       Figure 2.1: AR(1) process with             = 0.9
2.1 Autoregressive Processes   35



     xt
4



2



0                                                              t


-2



-4
                       a) Realisation


    1
0.8
0.6
0.4
0.2
    0
-0.2              5            10            15           20
-0.4
-0.6
-0.8
                b) Theoretical autocorrelation function
 -1

     ˆ
    1
0.8
0.6
0.4
0.2
    0
-0.2             5             10            15           20
-0.4
-0.6
-0.8
 -1             c) Estimated autocorrelation function
                   with confidence intervals



          Figure 2.2: AR(1) process with = 0.5
36   Univariate Stationary Processes



                    xt

                5



               2.5



                0                                                                t


              -2.5



                -5                         a) Realisation


                    1
              0.8
              0.6
              0.4
              0.2
                0
              -0.2                 5            10            15             20
              -0.4
              -0.6
              -0.8
                -1             b) Theoretical autocorrelation function

                    ˆ
                1
              0.8
              0.6
              0.4
              0.2
                0
             -0.2                 5             10            15            20
             -0.4
             -0.6
             -0.8
                -1             c) Estimated autocorrelation function
                                  with confidence intervals



                         Figure 2.3: AR(1) process with            = -0.9
2.1 Autoregressive Processes        37

It generally holds that the closer the parameter is to +1, the smoother the realisa-
tions will be. For negative values of we get zigzag developments which are the
more pronounced the closer is to -1. For = 0 we get a pure random process.
The autocorrelation functions estimated by means of relation (1.10) with the given
realisations are also presented in Figures 2.1 to 2.3. The dotted parallel lines show
approximate 95 percent confidence intervals for the null hypothesis assuming that
the true process is a pure random process. In all three cases, the estimated func-
tions reflect quite well the typical development of the theoretical autocorrelations.

Example 2.2
In a paper on the effect of economic development on the electoral chances of the
German political parties during the period of the social-liberal coalition from 1969
to 1982, GEBHARD KIRCHGÄSSNER (1985) investigated (besides other issues) the
time series properties of the popularity series of the parties constructed by monthly
surveys of the Institute of Demoscopy in Allensbach (Germany). For the period
from January 1971 to April 1982, the popularity series of the Christian Democrat-
ic Union (CDU), i.e. the share of voters who answered that they would vote for
this party (or its Bavarian sister party, the CSU) if there were a general election by
the following Sunday, is given in Figure 2.4. The autocorrelation and the partial
autocorrelation function (which is discussed in Section 2.1.4) are also presented in
this figure. While the autocorrelation function goes slowly towards zero, the par-
tial autocorrelation function breaks off after = 1. This argues for an AR(1) pro-
cess.
   The model has been estimated with Ordinary Least Squares (OLS), the method
proposed in Section 2.1.5 for the estimation of autoregressive models. Thus, we
get:
             CDUt    = 8.053 + 0.834 CDUt-1 + ût,
                       (3.43) (17.10)
             R 2 = 0.683, SE = 1.586, Q(11) = 12.516 (p = 0.326).
The estimated t values are given in parentheses, SE denotes the standard error of
the residuals. The autocorrelogram, which is also given in Figure 2.4, does not in-
dicate any higher-order process. Moreover, given the high p-value, the Ljung-Box
Q statistic with 12 correlation coefficients (i.e. with 11 degrees of freedom) gives
no reason to reject this model. The mean is calculated as
                                     8.053
                            ˆ                       48.512 .
                                    1 0.834

It shows that about 48.5 percent of the voters voted on average for the CDU dur-
ing this period.
38   Univariate Stationary Processes




              Percent
                56
                54
                52
                50
                48
                46
                44
                42
                40                                                                   year
                      1971       1973     1975     1977      1979        1981

                             a) Popularity of the CDU/CSU, 1971 – 1982
                 ˆ( )
                 1
                0.8
                0.6
                0.4
                0.2
                 0
               -0.2                 5             10             15             20
               -0.4
               -0.6
               -0.8              b) Autocorrelation (__) and partial (    )
                 -1                 autocorrelation functions with
                                    confidence intervals
                 ˆ( )
                 1
                0.8
                0.6
                0.4
                0.2
                 0
               -0.2                 5            10             15              20
               -0.4
               -0.6
               -0.8              c) Estimated autocorrelation function of the
                -1
                                    residuals of the estimated AR(1)-process
                                    with confidence intervals



            Figure 2.4: Popularity of the CDU/CSU, 1971 – 1982
2.1 Autoregressive Processes       39

Stability Conditions

Along with the stochastic initial value, the condition | | < 1, the so-called
stability condition, is crucial for the stationarity of the AR(1) process. We
can also derive the stability condition from the linear homogeneous differ-
ence equation, which is given for the process itself by
                                   xt –       xt-1 = 0,
for its autocovariances by
                                  ( ) –        ( -1) = 0
and for the autocorrelations by
                                  ( ) –       ( -1) = 0.
These difference equations have stable solutions, i.e. lim ( ) = 0, if and
only if their characteristic equation
(2.10)                                    –    = 0
has a solution (root) with an absolute value smaller than one, i.e. if | | < 1
holds. We get an equivalent condition if we do not consider the character-
istic equation but the lag polynomial of the corresponding difference equa-
tions,
(2.11)                             1 –        L = 0.
This implies that the solution has to be larger than one in absolute value.
(Strictly speaking, L, which denotes an operator, has to be substituted by a
variable, which is often denoted by ‘z’. To keep the notation simple, we
use L in both meanings.)

Example 2.3
Let us consider the stochastic process
(E2.1)                               yt = xt + vt .
In this equation, xt is a stationary AR(1) process, xt = xt-1 + ut, with | | < 1; vt is
a pure random process with mean zero and constant variance 2 which is uncorre-
                                                                v
lated with the other pure random process ut with mean zero and constant variance
  2
  u.
    We can interpret the stochastic process yt as an additive decomposition of two
stationary components. Then yt itself is stationary. In the sense of MILTON
FRIEDMAN (1957) we can interpret xt as the permanent (systematic) and vt as the
transitory component.
40    Univariate Stationary Processes

  What does the correlogram of yt look like? As both xt and vt have zero mean,
E[yt] = 0. Multiplying (E2.1) with yt- and taking expectations results in
                                E[yt- yt] = E[yt- xt] + E[yt- vt] .
Due to yt- = xt- + vt- , we get
                E[yt- yt] = E[xt- xt] + E[vt- xt] + E[xt- vt] + E[vt- vt].
As ut and vt are uncorrelated, it holds that E[vt- xt] = E[xt- vt] = 0, and because of
the stationarity of the two processes, we can write
(E2.2)                                      y(   ) =         x(       )+          v(      ).
For = 0 we get the variance of yt as
                                                                                          2
                                                                  2                       u                    2
                                y(0) =           x(0) +           v       =                       2
                                                                                                          +    v   .
                                                                                  1
For      > 0, because of   v(   ) = 0 for              0, we get from (E2.2)
                                                                                              2
                                                                                              u
                                       y(   ) =      x(      ) =                                      2
                                                                                                          .
                                                                                   1
Thus, we finally get

                           y(   ) =                      2        2           2
                                                                                      ,               = 1, 2, ...,
                                       1 (1                  )    v   /       u


for the correlogram of yt. The overlay of the systematic component by the transito-
ry component reduces the autocorrelation generated by the systematic component.
The larger the variance of the transitory component, the stronger is this effect.


2.1.2 Second Order Autoregressive Processes

Generalising (2.1), the second order autoregressive process (AR(2)) can
be written as
(2.12)                           xt =            +   1   xt-1 +               2    xt-2 + ut,
                                                                                                                       2
with ut denoting a pure random process with variance                                                                       and   2   0. With
the lag operator L we get
(2.13)                          (1 –    1   L–         2     L2) xt =                                 + ut .
With (L) = 1 –        1   L–     2   L2 we can write
(2.14)                                       (L) xt =                         + ut .
2.1 Autoregressive Processes                        41

As for the AR(1) process, we get the Wold representation from (2.14) if
we invert (L); i.e. under the assumption that -1(L) exists and has the
property
                                                                         -1
(2.15)                                                 (L)                    (L) = 1
we can ‘solve’ for xt in (2.14):
                                                           -1                                  -1
(2.16)                            xt =                          (L)                   +             (L) ut .
If we use the series expansion with undetermined coefficients for
                                      -1                                                               2
                                           (L) =                    0    +            1L      +      2L      + ...
it has to hold that
               1 = (1 –           1   L–           2   L2 )(              0   +               1L    +       2L
                                                                                                              2
                                                                                                                  +        3L
                                                                                                                             3
                                                                                                                                 + ... )
because of (2.15). This relation is an identity only if the coefficients of Lj,
j = 0, 1, 2, ..., are equal on both the right and the left hand side. We get
           1          0                1   L                                   L2 2                               3   L3              ...
                                                                                  2                                        3
                                       1       0   L                          1 1L                                1    2L             ... .
                                                                                   2                                     3
                                                                              2 0L                                2    1L             ...

Comparing the coefficients of the lag polynomials on the right- and left-
hand side finally leads to
               L0:                                                                                      0   = 1
               L1:        1   –   1    0    = 0                                                         1   =         1.

               L2:        2   –   1    1    –          2        0   = 0                                 2   =          2
                                                                                                                       1   +     2.


               L3:        3   –   1    2    –          2        1   = 0                                 3   =          3
                                                                                                                       1   +2     1 2.

By applying this so-called method of undetermined coefficients, we get the
values j, j = 2, 3, ..., from the linear homogeneous difference equation
                                               j   –       1            j-1   –           2    j-2   = 0
with the initial conditions 0 = 1 and 1 = 1.
   The stability condition for the AR(2) process requires that, for j                                                                          ,
the j converge to zero, i.e. that the characteristic equation of (2.12),
                                                   2
(2.17)                                                 –        1             –       2       = 0,
has only roots with absolute values smaller than one, or that all solutions
of the lag polynomial in (2.13),
42   Univariate Stationary Processes

(2.18)                            1–       1   L–          2   L2 = 0
are larger than one in modulus. Together with stochastic initial conditions,
this guarantees the stationarity of the process. The stability conditions are
fulfilled if the following parameter restrictions hold jointly for (2.17) and
(2.18):
                                 1 + (- 1) + (- 2) > 0,
                                 1 – (- 1) + (- 2) > 0,
                                 1 – (- 2) > 0.
As a constant is not changed by the application of the lag operator, the
number ‘1’ can substitute the lag operator in the corresponding terms.
Thus, due to (2.16), the Wold representation of the AR(2) process is given
by

(2.19)          xt =                                                       j   ut   j    ,    0   = 1.
                         1         1       2               j 0


Under the assumption of stationarity, the expected value of the stochastic
process can be calculated directly from (2.12) since E[xt] = E[xt-1] = E[xt-2]
= . We get
                                       =       +       1           +   2

or

(2.20)                   E[xt] =                   =                                      .
                                                           1           1            2

As the stability conditions are fulfilled, 1 – 1 – 2 > 0 holds, i.e. the sign
of also determines the sign of .
   In order to calculate the second order moments, we can assume – with-
out loss of generality – that = 0, which is equivalent to = 0. Multiply-
ing (2.12) with xt- ,   0, and taking expectations leads to
(2.21)     E[xt- xt] =       1   E[xt- xt-1] +                 2   E[xt- xt-2] + E[xt- ut] .
Because of representation (2.19), relation (2.8) holds here as well. This
leads to the following equations
                                                                                                         2
                 0   :       (0)               1   (1)                          2       (2)
(2.22)           1   :       (1)               1   (0)                          2       (1)                  ,
                 2   :       (2)               1   (1)                          2       (0)
2.1 Autoregressive Processes                         43

and, more generally, the following difference equation holds for the auto-
covariances ( ), 2,
(2.23)                    ( )–    1   ( -1) –     2       ( -2) = 0.
As the stability conditions hold, the autocovariances which can be recur-
sively calculated with (2.23) are converging to zero for     .
   The relations (2.22) result in
                                                     1                 2                               2
(2.24)         V[xt] =      (0) =                                                  2           2
                                        (1       2 ) [(1                   2   )               1   ]
for the variance of the AR(2) process, and in

                                                      1                                    2
                       (1) =                                       2               2
                                                                                               ,
                                 (1      2   ) [(1         2   )                   1   ]
and
                                             2                     2
                                             1        2            2                       2
                       (2) =                                       2               2
                                                                                               ,
                                 (1      2   ) [(1         2   )                   1   ]
for the autocovariances of order one and two.
   The autocorrelations can be calculated accordingly. If we divide (2.23)
by the variance (0) we get the linear homogeneous second order differ-
ence equation,
(2.25)                    ( )–    1    ( -1) –    2       ( -2) = 0
with the initial conditions (0) = 1 and (1) = 1/(1 – 2) for the autocorre-
lation function. Depending on the values of 1 and 2, AR(2) processes can
generate quite different developments, and, therefore, these processes can
show considerably different characteristics.

Example 2.4
Let us consider the AR(2) process
(E2.3)                     xt = 1 + 1.5 xt-1 – 0.56 xt-2 + ut
with a variance of ut of 1. Because the characteristic equation
                                  2
                                      – 1.5 + 0.56 = 0
has the two roots 1 = 0.8 and 2 = 0.7, (E2.3) is stationary, given that we have
stochastic initial conditions. The expected value of this process is
44    Univariate Stationary Processes


                                        1
                              =                       = 16.6 .
                                   1 1.5 0.56

The variance of (E2.3) can be calculated from (2.24) as (0) = 19.31. A realisation
of this process (with 180 observations) is given in Figure 2.5 in which the (esti-
mated) mean was subtracted. Thus, the realisations fluctuate around zero, and the
process always tends to go back to the mean. This mean-reverting behaviour is a
typical property of stationary processes.
   Due to (2.25) we get
                   ( ) – 1.5 ( -1) + 0.56 ( -2) = 0, = 2, 3, ...,
                            with (0) = 1, (1) = 0.96
for the autocorrelation function. The general solution of this homogeneous differ-
ence equation is
                             ( ) = C1 (0.8) + C2 (0.7) ,
where C1 and C2 are two arbitrary constants. Taking into account the two initial
conditions we get
                             ( ) = 2.6 (0.8) – 1.6 (0.7)
for the autocorrelation coefficients. This development is also expressed in Figure
2.5. The coefficients are always positive but strictly monotonically decreasing.
Initially, the estimated autocorrelogram using the given realisation is also mono-
tonically decreasing, but, contrary to the theoretical development, the values begin
to fluctuate from the tenth lag onwards. However, except for the coefficient for =
16, the estimates are not significantly different from zero; they are all inside the
approximate 95 percent confidence interval indicated by the dotted lines.

The characteristic equations of stable autoregressive processes of second
or higher order can result in conjugate complex roots. In this case, the time
series exhibit dampened oscillations, which are shocked again and again
by the pure random process. The solution of the homogeneous part of
(2.12) for conjugate complex roots can be represented by
                        xt = dt (C1 cos (f t) + C2 sin (f t))
with C1 and C2 again being arbitrary constants that can be determined by
using the initial conditions. The dampening factor
                                   d =            2


corresponds to the modulus of the two roots, and

                             f =     arccos           1

                                              2           2
2.1 Autoregressive Processes       45




   xt
 10


  5


  0                                                              t


  -5


-10
                      a) Realisation


  1

0.8

0.6

0.4

0.2

  0
              5             10              15              20
-0.2

-0.4
            b) Theoretical autocorrelation function
       ˆ
   1

  0.8

  0.6

  0.4

  0.2

   0
               5            10              15              20
 -0.2

 -0.4
            c) Estimated autocorrelation function
               with confidence intervals



   Figure 2.5: AR(2) process with       1   = 1.5,    2=   -0.56
46   Univariate Stationary Processes



               xt

              5



             2.5



              0                                                                 t


            -2.5



             -5
                        a) Realisation


              1
             0.8
             0.6
             0.4
             0.2
               0
            -0.2            5               10              15             20
            -0.4
            -0.6
            -0.8
              -1        b) Theoretical autocorrelation function


               ˆ

               1
             0.8
             0.6
             0.4
             0.2
               0
            -0.2             5              10              15              20
            -0.4
            -0.6
            -0.8
              -1        c) Estimated autocorrelation function
                           with confidence intervals


             Figure 2.6: AR(2) process with           1   = 1.4 and   2   = -0.85
2.1 Autoregressive Processes        47

is the frequency of the oscillation. The period of the cycles is P = 2 /f.
Processes with conjugate complex roots are well-suited to describe busi-
ness cycle fluctuations.

Example 2.5
Consider the AR(2) process
(E2.4)                      xt = 1.4 xt-1 – 0.85 xt-2 + ut,
with a variance of ut of 1. The characteristic equation
                                 2
                                     – 1.4 + 0.85 = 0
has the two solutions 1 = 0.7 + 0.6i and 2 = 0.7- 0.6i. (‘i’ stands for the imagi-
nary unit: i2 = - 1.) The modulus (dampening factor) is d = 0.922. Thus, (E2.4)
with stochastic initial conditions and a mean of zero is stationary. According to
(2.24) the variance is given by (0) = 8.433.
   A realisation of this process with 180 observations is given in Figure 2.6. Its
development is cyclical around its zero mean. For the autocorrelation function we
get
                   ( ) – 1.4 ( -1) + 0.85 ( -2) = 0,          = 2, 3, ...,
                               (0) = 1, (1) = 0.76,
because of (2.25).
  The general solution is
                  ( ) = 0.922 (C1 cos (0.709 ) + C2 sin (0.709 )) .
Taking into account the two initial conditions, we get for the autocorrelation coef-
ficients
                   ( ) = 0.922 (cos (0.709 ) + 0.1 sin (0.709 )) ,
with a frequency of f = 0.709.
   In case of quarterly data, this corresponds to a period length of about 9 quarters.
Both the theoretical and the estimated autocorrelations in Figure 2.6 show this
kind of dampened periodical behaviour.

Example 2.6
Figure 2.7 shows the development of the three month money market rate in Frank-
furt (GSR) from the first quarter of 1970 to the last quarter of 1998 as well as the
autocorrelation and the partial autocorrelation functions explained in Section 2.1.4.
Whereas the autocorrelation function tends only slowly towards zero, the partial
autocorrelation function breaks off after two lags. As will be shown below, this
indicates an AR(2) process. For the period from 1970 to 1998, estimation with
OLS results in the following:
48    Univariate Stationary Processes




               Percent

                16
                14
                12
                10
                 8
                 6
                 4
                 2
                 0                                                                year
                 1970      1975       1980         1985    1990       1995

                           a) Three month money market rate in Frankfurt
                              1970 – 1998
                  ˆ( )
                  1
                0.8
                0.6
                0.4
                0.2
                  0
                -0.2              5           10            15               20
                -0.4
                -0.6
                         b) Estimated autocorrelation (__) and partial
                -0.8
                            autocorrelation ( ) functions with confidence
                 -1         intervals

                  ˆ( )
                  1
                0.8
                0.6
                0.4
                0.2
                  0
                -0.2              5           10            15             20
                -0.4
                -0.6
                -0.8       c) Estimated autocorrelation function of the
                 -1           residuals of the estimated AR(2)-process
                              with confidence intervals



     Figure 2.7: Three month money market rate in Frankfurt, 1970 – 1998
2.1 Autoregressive Processes     49

            GSRt = 0.575 + 1.407 GSRt-1 – 0.498 GSRt-2 + ût,.
                   (2.82) (17.50)        (-6.16)
            R 2 = 0.910, SE = 0.812, Q(6) = 6.475 (p = 0.372),
with t values being again given in parentheses. On the 0.1 percent level, both es-
timated coefficients of the lagged interest rates are significantly different from ze-
ro. The autocorrelogram of the estimated residuals (given in Figure 2.7c) as well
as the Ljung-Box Q statistic which is calculated with 8 correlation coefficients
(and 6 degrees of freedom) does not indicate any higher order process.
The two roots of the process are 0.70 ± 0.06i, i.e. they indicate dampened cycles.
The modulus (dampening factor) is d = 0.706; the frequency f = 0.079 corresponds
to a period of 79.7 quarters and therefore of nearly 20 years. Correspondingly, this
oscillation cannot be detected in the estimated autocorrelogram presented in Fig-
ure 2.7b.


2.1.3 Higher Order Autoregressive Processes

An AR(p) process can be described by the following stochastic difference
equation,
(2.26)               xt =           +       1   xt-1 +        2   xt-2 + ... +         p   xt-p + ut,
with p 0, where ut is again a pure random process with zero mean and
variance 2. Using the lag operator we can also write:
(2.26')              (1 –   1   L–              2   L2 – ... –          p   Lp) xt =            + ut.
If we assume stochastic initial conditions, the AR(p) process in (2.26) is
stationary if the stability conditions are satisfied, i.e. if the characteristic
equation
                            p                   p-1               p-2
(2.27)                          –       1             –   2             – ... –    p       = 0
only has roots with absolute values smaller than one, or if the solutions of
the lag polynomial
(2.28)                      1–          1   L–        2   L2 – ... –          p   Lp = 0
only have roots with absolute values larger than one.
   If the stability conditions are satisfied, we get the Wold representation
of the AR(p) process by the series expansion of the inverse lag polynomial,
                           1                                                                  2
                                                      p
                                                              = 1+                1L   +    2L    + ...
                 1      1L ...                      pL


as
50    Univariate Stationary Processes


(2.29)               xt =                                                           j   ut   j   .
                                1           1       ...          p            j 0


Generalising the approach that was used to calculate the coefficients of the
AR(2) process, the series expansion can again be calculated by the method
of undetermined coefficients.
   From (2.29) we get the constant (unconditional) expectation as
                         E[xt] =                                                =            .
                                        1           1          ...        p


Again, similarly to the AR(1) and AR(2) cases, a necessary condition for
stability is
                               1–   1   –       2   – ... –          p    > 0.
   Without loss of generality we can set = 0, i.e. = 0, in order to calcu-
late the autocovariances. Because of ( ) = E[xt- xt], we get according to
(2.26)
(2.30)           ( ) = E[xt- (      1   xt-1 +            2   xt-2 + ... +          p    xt-p + ut)] .
For = 0, 1, ... , p, it holds that
                                                                                                             2
           (0)       1   (1)                    2   (2)                                          p   (p)
           (1)       1   (0)                    2   (1)                                          p   (p 1)
(2.31)

           (p)       1   (p 1)                  2   (p 2)                                        p   (0)

because of the symmetry of the autocovariances and because of E[xt- ut] =
 2
   for = 0 and zero for > 0.
   This is a linear inhomogeneous equation system for given i and 2 to
derive the p + 1 unknowns (0), (1), ..., (p). For > p we get the linear
homogeneous difference equation to calculate the autocovariances of order
 > p:
(2.32)                ( )–      1   ( -1) – ... –                p       ( -p)          = 0.
If we divide (2.32) by (0), we get the corresponding difference equation
to calculate the autocorrelations:
(2.33)                ( )–      1   ( -1) – ... –                    p   ( -p)          = 0.
The initial conditions (1), (2), ..., (p) can be derived from the so-called
Yule-Walker equations. We get those if we successively insert = 1, 2, ...,
p in (2.33), or, if the last p equations in (2.31) are divided by (0),
2.1 Autoregressive Processes                           51

          (1)    =    1           +     2   (1)       +           3        (2)         + ... +   p   (p-1)
          (2)    =    1    (1)    +     2             +           3        (1)         + ... +   p   (p-2)
(2.34)
          (p)    =    1    (p-1) +      2   (p-2) +               3        (p-3) + ... +         p

If we define ' = ( (1), (2), ..., (p)), ' = ( 1,                          2,   ...,   p)   and
                           1             (1)              (2)                          (p 1)
                           (1)           1                (1)                          (p 2)
                R
                p p

                          (p 1)        (p 2)         (p 3)                                 1

we can write the Yule-Walker equations (2.34) in matrix form,
(2.35)                                      = R           .
If the first p autocorrelation coefficients are given, the coefficients of the
AR(p) process can be calculated according to (2.35) as
(2.36)                                      = R-1             .
Equations (2.35) and (2.36) show that there is a one-to-one mapping be-
tween the p coefficients and the first p autocorrelation coefficients of
an AR(p) process. If there is a generating pure random process, it is suffi-
cient to know either or to identify the AR(p) process. Thus, there are
two possibilities to describe the structure of an autoregressive process of
order p: the parametric representation that uses the parameters 1, 2, ..., p,
and the non-parametric representation with the first p autocorrelation coef-
ficients (1), (2), ..., (p). Both representations contain exactly the same
information. Which representation is used depends on the specific situa-
tion. We usually use the parametric representation to describe finite order
autoregressive processes (with known order).

Example 2.7
Let the fourth order autoregressive process
                           xt =    4   xt-4 + ut, 0 <             4       < 1,
                                                                                                 2
be given, where ut is again white noise with zero mean and variance                              . Applying
(2.31) we get:
                                                                      2
                                  (0) =        4   (4) +              ,
                                  (1) =        4   (3),
                                  (2) =        4   (2),
52    Univariate Stationary Processes

                                (3) =            4   (1),
                                (4) =            4   (0).
From these relations we get
                                                 2
                              (0) =                  2
                                                           ,
                                         1           4


                              (1) =      (2) =             (3) = 0,
                                                     2
                              (4) =      4                 2
                                                               .
                                             1             4


As can easily be seen, only the autocovariances with lag = 4j, j = 1, 2, ... are dif-
ferent from zero, while all other autocovariances are zero. Thus, for > 0 we get
the autocorrelation function
                                     j
                                     4   for              4 j, j 1, 2, ...
                       ( ) =                                               .
                                 0                       elsewhere.

Only every fourth autocorrelation coefficient is different from zero; the sequence
of these autocorrelation coefficients decreases monotonically like a geometric se-
ries. Employing such a model for quarterly data, this AR(4) process captures the
correlation between random variables that are distant from each other by a multi-
plicity of four periods, i.e. the structure of the correlations of all variables which
belong to the i-th quarter of a year, i = 1, 2, 3, 4, follows an AR(1) process while
the correlations between variables that belong to different quarters are always ze-
ro. Such an AR(4) process provides a simple possibility of modelling seasonal ef-
fects which typically influence the same quarters of different years. For empirical
applications, it is advisable to first eliminate the deterministic component of a sea-
sonal variation by employing seasonal dummies and then to model the remaining
seasonal effects by such an AR(4) process.


2.1.4 The Partial Autocorrelation Function

Due to the stability conditions, autocorrelation functions of stationary fi-
nite order autoregressive processes are always sequences that converge to
zero but do not break off. This makes it difficult to distinguish between
processes of different orders when using the autocorrelation function. To
cope with this problem, we introduce a new concept, the partial autocorre-
lation function. The partial correlation between two random variables is
the correlation that remains if the possible impact of all other random vari-
ables has been eliminated. To define the partial autocorrelation coefficient,
we use the new notation,
2.1 Autoregressive Processes                 53


                 xt =           k1xt-1   +     k2xt-2   + … +           kkxt-k   + ut ,
where ki is the coefficient of the variable with lag i if the process has or-
der k. (According to the former notation it holds that i = ki i = 1,2,…,k.)
The coefficients kk are the partial autocorrelation coefficients (of order k),
k = 1,2,… . The partial autocorrelation measures the correlation between xt
and xt-k which remains when the influences of xt-1, xt-2, ..., xt-k+1 on xt and
xt-k have been eliminated.
   Due to the Yule-Walker equations (2.35), we can derive the partial au-
tocorrelation coefficients kk from the autocorrelation coefficients if we
calculate the coefficients kk, which belong to xt-k, for k = 1, 2, ... from the
corresponding linear equation systems
     1           (1)             (2)                (k 1)          k1            (1)
     (1)         1               (2)                (k 2)          k2            (2)
                                                                                          , k = 1, 2, ... .

   (k 1)     (k 2)             (k 3)                    1          kk            (k)

With Cramer’s rule we get
                               1              (1)              (1)
                               (1)            1                (2)

                              (k 1)       (k 2)                (k)
(2.37)      kk                                                     , k = 1, 2, ... .
                              1           (1)                (k 1)
                              (1)         1                  (k 2)

                          (k 1)          (k 2)                 1
Thus, if the data generating process (DGP) is an AR(1) process, we get for
the partial autocorrelation function:
                     11   =     (1)
                                 1           (1)
                                 (1)         (2)            (2) (1) 2
                     22   =                         =                            = 0,
                                 1           (1)            1 (1) 2
                                 (1)         1
54    Univariate Stationary Processes

because of (2) = (1)2. Generally, the partial autocorrelation coefficients
 kk = 0 for k >1 in an AR(1) process.
   If the DGP is an AR(2) process, we get
                                     (2) (1)2
             11   = (1),    22   =            ,        kk   = 0 for k > 2 .
                                     1 (1)2
The same is true for an AR(p) process: all partial autocorrelation coeffi-
cients of order higher than p are zero. Thus, for finite order autoregressive
processes, the partial autocorrelation function provides the possibility of
identifying the order of the process by the order of the last non-zero partial
autocorrelation coefficient. We can estimate the partial autocorrelation co-
efficients consistently by substituting the theoretical values in (2.37) by
their consistent estimates (1.10). For the partial autocorrelation coefficients
which have a theoretical value of zero, i.e. the order of which is larger than
the order of the process, we get asymptotically that they are normally dis-
tributed with E[ ˆ kk ] = 0 and V[ ˆ kk ] = 1/T for k > p .

Example 2.8
The AR(1) process of Example 2.1 has the following theoretical partial autocorre-
lation function: 11 = (1) = and zero elsewhere. In this example, takes on the
values 0.9, 0.5 and -0.9. The estimates of the partial autocorrelation functions for
the realisations in Figures 2.1 and 2.3 are presented in Figure 2.8. It is obvious for
both processes that these are AR(1) processes. The estimated value for the process
with = 0.9 is ˆ 11 = 0.91, while all other partial autocorrelation coefficients are
not significantly different from zero. We get ˆ = -0.91 for the process with
                                                  11

= -0.9, while all estimated higher order partial autocorrelation coefficients do not
deviate significantly from zero.
     The AR(2) process of Example 2.4 has the following theoretical partial auto-
correlation function: 11 = 0.96, 22 = -0.56 and zero elsewhere. The realisation of
this process, which is given in Figure 2.5, leads to the empirical partial autocorre-
lation function in Figure 2.8. It corresponds quite closely to the theoretical func-
tion; we get ˆ 11 = 0.95 and ˆ 22 = -0.60 and all higher order partial autocorrelation
coefficients are not significantly different from zero. The same holds for the
AR(2) process with the theoretical non-zero partial autocorrelations 11 = 0.76 and
                                                          ˆ               ˆ
  22 = -0.85 given in Example 2.5. We get the estimates 11 = 0.76 and 22 = -0.78,
whereas all higher order partial correlation coefficients are not significantly differ-
ent from zero.
2.1 Autoregressive Processes    55




        kk
    1
  0.8
  0.6
  0.4
  0.2
    0                                                              k
 -0.2              5              10                 15       20
 -0.4
 -0.6
 -0.8
             AR(1) process with    = 0.9
   -1

        kk
    1
  0.8
  0.6
  0.4
  0.2
    0                                                              k
 -0.2              5              10                  15      20
 -0.4
 -0.6
 -0.8
             AR(1) process with    = -0.9
   -1

        kk
    1
  0.8
  0.6
  0.4
  0.2
    0                                                              k
 -0.2              5              10                  15      20
 -0.4
 -0.6
 -0.8
             AR(2) process with    1   = 1.5,   2   = -0.56
   -1
        kk
    1
  0.8
  0.6
  0.4
  0.2
    0                                                              k
 -0.2              5              10                  15      20
 -0.4
 -0.6
 -0.8
   -1        AR(2) process with    1   = 1.4,   2   = -0.85


Figure 2.8: Estimated partial autocorrelation functions
56    Univariate Stationary Processes

2.1.5 Estimating Autoregressive Processes

Under the assumption of a known order p we have different possibilities to
estimate the parameters:
(i)   If we know the distribution of the white noise process that generates
      the AR(p) process, the parameters can be estimated by using maxi-
      mum likelihood (ML) methods.
(ii) The parameters can also be estimated with the method of moments by
     using the Yule-Walker equations.
(iii) A further possibility is to treat
      (2.26)             xt =    +   1   xt-1 +        2   xt-2 + ... +           p   xt-p + ut,
      as a regression equation and apply the ordinary least squares (OLS)
      method for estimation. OLS provides consistent estimates. Moreover,
      if (2.26) fulfils the stability conditions, T ( ˆ ) as well as
        T( ˆ i   i   ) , i = 1, 2, ..., p, are asymptotically normally distributed.

If the order of the AR process is unknown, it can be estimated with the
help of information criteria. For this purpose, AR processes with succes-
sively increasing orders p = 1, 2, ..., pmax are estimated. Finally, the order
p* is chosen which minimises the respective criterion. The following crite-
ria are often used:
(i)   The final prediction error which goes back to HIROTUGU AKAIKE
      (1969)
                                                             T
                                         T m 1
                             FPE =                                 (u (p) ) 2 .
                                                                    ˆt
                                         T m T              t 1


(ii) Closely related to this is the Akaike information criterion (HIROTUGU
     AKAIKE (1974))
                                                T
                                           1                               2
                            AIC = ln                  (u (p) ) 2
                                                       ˆt              m     .
                                           T    t 1                        T

(iii) Alternatives are the Bayesian criterion of GIDEON SCHWARZ (1978)
                                               T
                                          1                              ln T
                            SC = ln                  (u (p) ) 2
                                                      ˆt             m
                                          T    t 1                        T

(iv) as well as the criterion developed by EDWARD J. HANNAN and
     BARRY G. QUINN (1979)
2.1 Autoregressive Processes   57

                                     T
                                 1                           2 ln(ln T)
                     HQ = ln               (u (p) ) 2
                                            ˆt           m              .
                                 T   t 1                         T

 u (p) are the estimated residuals of the AR(p) process, while m is the number
 ˆt
of estimated parameters. If the constant term is estimated, too, m = p + 1
for an AR(p) process. These criteria are always based on the same princi-
ple: They consist of one part, the sum of squared residuals (or its loga-
rithm), which decreases when the number of estimated parameters increas-
es, and of a ‘penalty term’, which increases when the number of estimated
parameters increases. Whereas the first two criteria overestimate the true
(finite) order asymptotically, the two other criteria estimate the true order
of the process consistently. For T 16, the penalty term of SC is larger
than the one of HQ which itself is larger than the one of AIC. This leads to
the following ordering of the estimated AR orders:
                     SC order      HQ order                  AIC order.
Please note that choosing such an order does not always imply that we
have white noise residuals. This has to be checked independently. Many
computer programmes like, for example, EViews, do not exactly report the
criteria given in (ii) through (iv). Relying on the log-likelihood function
instead of on the sum of squared residuals directly, they add 1 + ln(2 )
2.8379, which does, of course, neither affect the order nor which value of p
minimises the information criteria.

Example 2.9
As in Example 2.6, we take a look at the development of the three month money
market interest rate in Frankfurt am Main. If, for this series, we estimate AR pro-
cesses up to the order p = 4, we get the following results (for T = 116):
             p = 0: AIC = 4.8334, HQ = 4.8430, SC = 4.8571;
             p = 1: AIC = 2.7180, HQ = 2.7373, SC = 2.7655;
             p = 2: AIC = 2.4457, HQ = 2.4746, SC = 2.5169;
             p = 3: AIC = 2.4609, HQ = 2.4995, SC = 2.5559;
             p = 4: AIC = 2.4778, HQ = 2.5260, SC = 2.5965.
With all three criteria we get the minimum for p = 2. Thus, the optimal number of
lags is p* = 2, as used in Example 2.6.
58     Univariate Stationary Processes


2.2 Moving Average Processes

Moving average processes of an infinite order have already occurred when
we presented the Wold decomposition theorem. They are, above all, of
theoretical importance as, in practice, only a finite number of (different)
parameters can be estimated. In the following, we consider finite order
moving average processes. We start with the first order moving average
process and then discuss general properties of finite order moving average
processes.


2.2.1 First Order Moving Average Processes

The first order moving average process (MA(1)) is given by the following
equation:
(2.38)                        xt =            + ut –        ut-1 ,
or
(2.38')                        xt –           = (l – L)ut ,
with ut again being a pure random process. The Wold representation of an
MA(1) process (as of any finite order MA process) has a finite number of
terms. In this special case, the Wold coefficients are 0 = 1, 1 = - and j
                          2
= 0 for j 2. Thus,        j is finite for all finite values of , i.e. an MA(1)
                        j

process is always stationary.
  Taking expectations of (2.38) leads to
                    E[xt] =        + E[ut] –               E[ut-1] =            .
The variance can also be calculated directly,
                   V[xt] = E[(xt – )2]
                             = E[(ut –         ut-1)2]
                                                                     2
                             = E[( u 2 – 2 ut ut-1 +
                                     t                                   u 2 1 )]
                                                                           t

                                      2        2
                             = (1 +       )         =      (0) .
Therefore, the variance is constant at any point of time.
  For the covariances of the process we get
     E[(xt – )(xt+ – )] = E[(ut –             ut-1)(ut+ –           ut+ -1)]
                                                                                     2
                            = E[(utut+ –           utut+   –1   –        ut-1ut+ +       ut-1ut+ -1)] .
2.2 Moving Average Processes   59

The covariances are different from zero only for = ± 1, i.e. for adjoining
random variables. In this case
                                                                   2
                                    (1) = -                            .
Thus, for an MA(1) process, all autocovariances and therefore all autocorre-
lations with an order higher than one disappear, i.e. ( ) = ( ) = 0 for 2.
   The correlogram of an MA(1) process is

             (0) = 1,       (1) =                       2
                                                            ,          ( ) = 0 for   2.
                                            1
If we consider (1) as a function of , (1) = f( ), it holds that f(0) = 0 and
f( ) = -f(- ), i.e. that f( ) is point symmetric to the origin, and that |f( )|
0.5. f( ) has its maximum at = -1 and its minimum at = 1. Thus, an
MA(1) process cannot have a first order autocorrelation above 0.5 or be-
low -0.5.
   If we know the autocorrelation coefficient (1) = 1, for example, by es-
timation, we can derive (estimate) the corresponding parameter by using
the equation for the first order autocorrelation coefficient,
                                        2
                            (1 +            )       1   +          = 0.
The quadratic equation can also be written as
                                2           1
(2.39)                              +                       + 1 = 0,
                                                1

and it has the two solutions
                                         1                                 2
                      1,2   =                               1      1 4     1   .
                                        2 1
Thus, the parameters of the MA(1) process can be estimated non-linearly
with the method of moments: the theoretical moments are substituted by
their consistent estimates and the resulting equation is used for estimating
the parameters consistently.
   Because of | 1| 0.5, the quadratic equation always results in real roots.
They also have the property that 1 2 = 1. This gives us the possibility to
model the same autocorrelation structure with two different parameters,
where one is the inverse of the other.
   In order to get a unique parameterisation, we require a further property
of the MA(1) process. We ask under which conditions the MA(1) process
(2.38) can have an autoregressive representation. By using the lag operator
representation (2.38') we get
60    Univariate Stationary Processes


                                                                  1
                         ut = –                     +                      xt .
                                           1             1             L
An expansion of the series 1/(1 – L) is only possible for                                    < 1 and re-
sults in the following AR( ) process
                                                                            2
                 ut = –                + xt +            xt-1 +                 xt-2 + ...
                            1
or
                                       2
                 xt +     xt-1 +           xt-2 + ... =                           + ut .
                                                                       1
This representation requires the condition of invertibility (  < 1). In this
case, we get a unique parameterisation of the MA(1) process. Applying the
lag polynomial in (2.38'), we can formulate the invertibility condition in
the following way: An MA(1) process is invertible if and only if the root
of the lag polynomial
                                       1– L = 0
is larger than one in modulus.

Example 2.10
The following MA(1) process is given:
(E2.5)                    xt =     t    –        t-1,    t   ~ N(0, 22),
with = -0.5. For this process we get
                            E[xt] = 0,
                            V[xt] = (1 + 0.52)·4 = 5,
                                                 0.5
                                (1) =                            = 0.4,
                                               1 0.52
                                ( ) = 0 for                       2.
Solving the corresponding quadratic equation (2.39) for this value of (1) leads to
the two roots 1 = -2.0 and 2 = -0.5. If we now consider the process
(E2.5a)                   yt =     t    + 2       t-1,       t   ~ N(0, 1),
we obtain the following results:
                            E[yt] = 0,
                            V[yt] = (1 + 2.02)·1 = 5,
2.2 Moving Average Processes        61


                                         2.0
                               (1) =            = 0.4,
                                       1 2.02
                               ( ) = 0 for       2,
i.e. the variances and the autocorrelogram of the two processes (E2.5) and (E2.5a)
are identical. The only difference between them is that (E2.5) is invertible, be-
cause the invertibility condition      < 1 holds, whereas (E2.5a) is not invertible.
Thus, given the structure of the correlations, we can choose the one of the two
processes that fulfils the invertibility condition without imposing any restrictions
on the structure of the process.

With equation (2.37), the partial autocorrelation function of the MA(1)
process can be calculated in the following way:
        11   =   (1),
                  1      (1)
                  (1)    0               (1) 2
        22   =               =                 < 0,
                  1      (1)           1 (1) 2
                  (1)    1

                  1      (1)     (1)
                  (1)    1       0
                  0      (1)     0       (1)3
        33   =                       =                   0 for          0,
                  1      (1)     0     1 2 (1) 2
                  (1)    1       (1)
                  0      (1)     1

                  1      (1)     0      (1)
                  (1)    1       (1)    0
                  0      (1)     1      0
                  0      0       (1)    0                  (1) 4
        44   =                              =                          < 0,
                  1      (1)     0      0     (1      (1) 2 ) 2  (1) 2
                  (1)    1       (1)    0
                  0      (1)     1      (1)
                  0      0       (1)    1
etc.
62    Univariate Stationary Processes


  If is positive, (1) is negative and vice versa. This leads to the two
possible patterns of partial autocorrelation functions, exemplified by =
±0.8:
                 = 0.8,    ii     {-0.49,-0.31,-0.22, -0.17, ... } ,
                 = -0.8,   ii     {0.49,-0.31, 0.22, -0.17, ... } .
Thus, contrary to the AR(1) process, the autocorrelation function of the
MA(1) process breaks off, while the partial autocorrelation function does
not. These properties hold generally, since invertible finite order MA pro-
cesses are equivalent to infinite order AR processes.


2.2.2 MA(1) and Temporal Aggregation

The time series which are discussed in this book are measured in discrete
time, with intervals of equal length. Exchange rates, for example, are nor-
mally quoted at the end of each trading day. For econometric analyses,
however, monthly, quarterly, or even annual data are used, rather than the-
se daily values. Usually, averages or end-of-period data are used for tem-
poral aggregation.
   Thus, two aggregation schemes have to be distinguished. The first one is
skip sampling (or: systematic sampling) where only every mth data point is
recorded. If xt is the basic series at t = 1, 2, 3,…, the skip sampled series ys
with new time scale s is end-of-period data,
               y1 = xm, y2 = x2m, y3 = x3m, …, ys = xsm.
Such an aggregation is typical for stock variables. However, the second
scheme of averaging over m non-overlapping periods is also widely used,
in particular for rates or indices:
                                1
                    y1            xm     xm   1           ... x1
                                m
                                1
                    y2            x 2m    x 2m        1     ... x m   1
                                m


                                1
                    ys            x sm   x sm     1        ... x (s   1)m 1   .
                                m
2.2 Moving Average Processes               63

In the following, we do not present a general theory of temporal aggrega-
tion but just discuss a special case of particular applied interest, the ran-
dom walk, with
                                              xt = xt-1 + ut,
where an artificial MA(1) structure arises due to aggregation by averaging.
It is straightforward to see that systematic sampling does not affect the
random walk property, since in this case we can write
                                                                sm
                                         ys = x0 +                   ut .
                                                               t 1


From this representation we get
                                              ys = ys-1 +             s,

with   s   being white noise:
                        s       = usm + usm-1 + ... + u(s-1)m+1,
with E[ s] = 0 and
                                                               2
                                                         m     u      for     0
                       E(   s   ·       s–    ) =                               .
                                                          0          elsewhere

Hence, the random walk property is inherited by ys, only the variance of
the differences ys – ys-1 is inflated in the obvious way. In case of averaging,
 ys , matters get more complicated. It can, however, be shown that the dif-
ferences
                                              ys    ys   1            s

follow no longer a white noise process but an MA(1) scheme hidden be-
hind
              1
   s            u sm   2u sm        1        ...   mu s      1m 1
                                                                       ...   2u s   2 m 3
                                                                                            us   2 m 2
                                                                                                         .
              m
We omit details but refer to HOLBROOK WORKING (1960) who showed
that with increasing aggregation level, m , one obtains the autocorre-
lation function
64    Univariate Stationary Processes


                                                               1,      0
                                E        s    s                1
                      ( ) =                                      ,      1 .
                                    V         s                4
                                                               0, elsewhere

Note that the above autocorrelation function corresponds to the following
MA(1)-process

                                     s                    us    us    1


where u s is white noise, and the limiting value (for m                        ) of the MA pa-
rameter is
                                             3 2                      0.268.
GEORGE C. TIAO (1972) generalised this result the following way:
If xt – xt-1 is not generated by white noise but by an invertible MA(1) pro-
cess, then ys ys 1 behaves with growing m like the MA(1) process
 us     u s 1 , where is independent of the underlying MA(1) structure of xt
– xt-1. This result even continues to hold when the assumption that xt – xt-1
is MA(1) is replaced by a more general moving average process of higher
order as introduced in subsection 2.2.3.

Example 2.11
Consider averaging over m = 2 periods,
                                                  1
                                ys                  x 2s       x 2s   1
                                                                          .
                                                  2
For the random walk xt = xt-1 + ut, it holds that

                        s       ys           ys   1


                                1
                            =     (x2s + x2s-1 – x2s-2 – x2s-3)
                                2
                                1
                            =     ( u2s + 2 u2s-1 + u2s-2) .
                                2
This process can be described as

                        s       us           us       1


with = 2 2 – 3       –0.172, and
2.2 Moving Average Processes          65


                                                          3       2
                                                                  u       for       0
                                                          2
                                                          1       2
                        E(    s   ·       s   ) =                 u       for       1,
                                                          4
                                                              0            elsewhere

such that for m = 2 the autocorrelation coefficient at lag one becomes (1) = 1/6.

Example 2.12
Example 1.3 as well as Figure 1.8 present the end-of-month exchange rate be-
tween the Swiss Franc and the U.S. Dollar over the period from January 1974 to
December 2011. The autocorrelogram of the first differences of the logarithms of
this time series indicates that they follow a pure random process. The tests we ap-
plied did not reject this null hypothesis.
   If we use monthly averages instead of end-of-month data, the following MA(1)
process can be estimated for the first difference of the logarithms of this exchange
rate:
              ln(et) = -0.003 + ût + 0.308 ût-1,
                          (-1.53)        (6.91)
             R2 =             0.082, SE = 0.028, Q(11) = 8.216 (p = 0.694),
             JB =            21.194 (p = 0.000),
with the t values again given in parentheses. ln(·) denotes the natural logarithm.
The estimated coefficient of the MA(1) term is highly significantly different from
zero. The Ljung-Box Q-statistic indicates that there is no longer any significant
autocorrelation in the residuals. As m 20 is relatively large (in this context), the
estimated values of the MA(1) term should not be too different from the theoreti-
cal value given by GEORGE C. TIAO (1972). The theoretical value -0.268 lies in the
two-sigma confidence interval of the estimated parameter -0.308.


2.2.3 Higher Order Moving Average Processes

In general, the moving average process of order q (MA(q)) can be written
as
(2.40)           xt =         + ut –             1   ut-1 –           2   ut-2 – ... –   q   ut-q
with     q   0 and ut as a pure random process. Using the lag operator we get
                                                                    2                 q
(2.40')             xt –              = (1 –         1L   –       2L      – ... –   qL )ut

                                      =       (L)ut .
66   Univariate Stationary Processes

From (2.40) we see that we already have a finite order Wold representation
with k = 0 for k > q. Thus, there are no problems of convergence, and
every finite MA(q) process is stationary, no matter what values are used
for j, j = 1, 2, ..., q.
   For the expectation of (2.40) we immediately get E[xt] = . Thus, the
variance can be calculated as:
          V[xt] = E[(xt – )2]
                 = E[(ut –     1   ut-1 – ... –                   q    ut-q)2]
                                                                   2
                 = E[( u 2 +
                         t
                                   2
                                   1   u 2 1 + ... +
                                         t                         q   u2 q – 2
                                                                        t                1   utut-1 – ...
                      –2   q-1 q   ut-q+1ut-q)] .
From this we obtain
                                              2               2                  2           2
                  V[xt] = (1 +                1       +       2   + ... +        q   )           .

For the covariances of order we can write
         Cov[xt, xt+ ] = E[(xt – )(xt+ – )]
                      = E[(ut –           1ut-1 – ... – q ut-q)
                         (ut+ –           1 ut+ -1 – ... – q ut+ -q)]

                      = E[ut(ut+ – 1 ut+ -1 – ... – q ut+ -q)
                         – 1 ut-1(ut+ – 1 ut+ -1 – ... – q ut+ -q)

                           –   q   ut-q(ut+ –                 1   ut+ -1 – ... –                     q   ut+ -q)] .
Thus, for = 1, 2, ..., q we get
                                                                                                          2
                  = 1: (1) = (–                   1   +       1    2   + ... +       q-1         q)           ,
                                                                                                          2
                   = 2:    (2) = (–               2   +       1    3   + ... +       q-2         q)           ,
(2.41)

                                                      2
                   = q:    (q) = –            q           ,
while we have ( ) = 0 for > q.
   Consequently, all autocovariances and autocorrelations with orders
higher than the order of the process are zero. It is – at least theoretically –
possible to identify the order of an MA(q) process by using the autocorre-
logram.
   It can be seen from (2.41) that there exists a system of non-linear equa-
tions for given (or estimated) second order moments that determines
(makes it possible to estimate) the parameters 1, ..., q. As we have al-
2.2 Moving Average Processes   67

ready seen in the case of the MA(1) process, such non-linear equation sys-
tems have multiple solutions, i.e. there exist different values for 1, 2, ...
and q that all lead to the same autocorrelation structure. To get a unique
parameterisation, the invertibility condition is again required, i.e. it must
be possible to represent the MA(q) process as a stationary AR( ) process.
Starting from (2.40'), this implies that the inverse operator -1(L) can be
represented as an infinite series in the lag operator, where the sum of the
coefficients has to be bounded. Thus, the representation we get is an
AR( ) process
                                                      -1
                          ut = –                 +         (L) xt
                                          (1)

                                = –              +         c jx t   j   ,
                                          (1)        j 0


where
                                            q
              1 = (1 –     1L   – ... –   qL )(      1 + c1L + c2L2 + ... ),
and the parameters ci, i = 1, 2, ... are calculated by using again the method
of undetermined coefficients. Such a representation exists if all roots of
                                                   q
                           1–     1L   – ... –   qL         = 0
are larger than one in absolute value.

Example 2.13
Let the following MA(2) process
                           xt = ut + 0.6 ut-1 – 0.1 ut-2
be given, with a variance of 1 given for the pure random process u. For the vari-
ance of x we get
                     V[xt] = (1 + 0.36 + 0.01) 1 = 1.37 .
Corresponding to (2.41) the covariances are
                           (1) = + 0.6 – 0.06 = 0.54
                           (2) = – 0.1                                      .
                           ( ) = 0 for > 2
This leads to the autocorrelation coefficients (1) = 0.39 and (2) = -0.07. To
check whether the process is invertible, the quadratic equation
                            1 + 0.6 L       0.1 L2 = 0
68    Univariate Stationary Processes

has to be solved. As the two roots -1.36 and 7.36 are larger than 1 in absolute val-
ue, the invertibility condition is fulfilled, i.e. the MA(2) process can be written as
an AR( ) process
                       xt = (1 + 0.6 L – 0.1 L2) ut ,
                                   1
                       ut =                xt
                              1 0.6L 0.1L2
                          = (1 + c1 L + c2 L2 + c3 L3 +         ) xt .
The unknowns ci, i = 1, 2, ..., can be determined by comparing the coefficients of
the polynomials in the following way:
                1 = (1 + 0.6 L – 0.1 L2)(1 + c1 L + c2 L2 + c3 L3 +           )
                                             2             3
                1 = 1 + c1 L +            c2 L +       c3 L +
                        + 0.6 L + 0.6 c1 L + 0.6 c2 L3 +
                                             2


                                       0.1 L2      0.1 c1 L3
It holds that
                               c1 + 0.6     = 0         c1 =        0.60,
                     c2 + 0.6 c1 – 0.1      = 0          c2 =       0.46,
                     c3 + 0.6 c2 – 0.1 c1 = 0            c3 =       0.34,
                     c4 + 0.6 c3 – 0.1 c2 = 0            c4 =       0.25,
                                                                         .
Thus, we get the following AR( ) representation
           xt – 0.6 xt-1 + 0.46 xt-2 – 0.34 xt-3 + 0.25 xt-4                 = ut .
Similarly to the MA(1) process, the partial autocorrelation function of the MA(q)
process does not break off. As long as the order q is finite, the MA(q) process is
stationary whatever its parameters are. If the order tends towards infinity, howev-
er, for the process to be stationary the series of the coefficients has to converge
just like in the Wold representation.


2.3 Mixed Processes

If we take a look at the two different functions that can be used to identify
autoregressive and moving average processes, we see from Table 2.1 that
the situation in which neither of them breaks off can only arise if there is
an MA( ) process that can be inverted to an AR( ) process, i.e. if the
Wold representation of an AR( ) process corresponds to an MA( ) pro-
cess. However, as pure AR or MA representations, these processes cannot
2.3 Mixed Processes       69

be used for empirical modelling because they can only be characterised by
means of infinitely many parameters. After all, according to the principle
of parsimony, the number of estimated parameters should be as small as
possible when applying time series methods.
   In the following, we introduce processes which contain both an auto-
regressive (AR) term of finite order p and a moving average (MA) term of
finite order q. Hence, these mixed processes are denoted as ARMA(p,q)
processes. They enable us to describe processes in which neither the auto-
correlation nor the partial autocorrelation function breaks off after a finite
number of lags. Again, we start with the simplest case, the ARMA(1,1)
process, and consider the general case afterwards.


          Table 2.1: Characteristics of the Autocorrelation and the Partial
                     Autocorrelation Functions of AR and MA Processes

                                                             Partial Autocorrelation
                       Autocorrelation Function
                                                                     Function

    MA(q)                  breaks off with q                     does not break off

     AR(p)                does not break off                     breaks off with p



2.3.1 ARMA(1,1) Processes

An ARMA(1,1) process can be written as follows,
(2.42)                    xt =       +     xt-1 + ut –       ut-1 ,
or, by using the lag operator
(2.42')                  (1 – L) xt =              + (1 –   L) ut ,
where ut is a pure random process. To get the Wold representation of an
ARMA(1,1) process, we solve (2.42') for xt,
                                                   1    L
                            xt =               +          ut .
                                     1             1    L
It is obvious that        must hold, because otherwise xt would be a pure
random process fluctuating around the mean = /(1 – ). The j, j = 0, 1,
2, ..., can be determined as follows:
70    Univariate Stationary Processes


            1       L                                            2              3
                      =         0       +   1L     +           2L     +       3L       + …
            1       L
                                                                                    2                  3
            1 – L = (1 – L)(                       0       +       1L     +       2L     +           3L     + …)
                                                                  2                     3
            1– L =              0       +   1L  +               2L        +           3L      + …
                                                                   2                    3
                                        –   0 L –               1L        –           2L      – … .
Comparing the coefficients of the two lag polynomials we get
                        L0:         0    = 1
                        L1:         1   –      0   = –                        1   =         –
                        L2:         2   –      1   = 0                        2   =      ( – )
                        L3:         3   –      2   = 0                        3   =     2
                                                                                            ( – )


                        Lj:      j   –      j-1    = 0                        j   =     j-1
                                                                                              ( – ).
The j, j        2 can be determined from the linear homogeneous difference
equation
                                                   j   –        j-1   =0
with 1 = – as initial condition. The j converge towards zero if and
only if | | < 1. This corresponds to the stability condition of the AR term.
Thus, the ARMA(1,1) process is stationary if, with stochastic initial condi-
tions, it has a stable AR(1) term. The Wold representation is
                                                                                                       2
(2.43) xt =                   + ut + ( – ) ut-1 +                         ( – ) ut-2 +                     ( – ) ut-3 + ... .
                    1
Thus, the ARMA(1,1) process can be written as an MA( ) process.
   To invert the MA(1) part, | | < 1 must hold. Starting from (2.42') leads
to
                                                                      1       L
                                 ut =                          +                xt .
                                               1                      1       L
If 1/(1 – L) is developed into a geometric series we get
                                                                      2 2
     ut =               + (1 – L)(1 + L +                             L + ... ) xt
                1

                                                                                                2
        =               + xt + ( – ) xt-1 + ( – ) xt-2 +                                            ( – ) xt-3 + ... .
                1
2.3 Mixed Processes                  71

This proves to be an AR( ) representation. It shows that the combination
of an AR(1) and an MA(1) term leads to a process with both MA( ) and
AR( ) representation if the AR term is stable and the MA term invertible.
   We obtain the first and second order moments of the stationary process
in (2.42) as follows:
                    E[xt] = E[ +                 xt-1 + ut –                       ut-1]
                           =       +           E[xt-1] .
Due to E[xt] = E[xt-1] =   , we get

                                       =                       ,
                                                1
i.e. the expectation is the same as in an AR(1) process.
   If we set = 0 without loss of generality, the expectation is zero. The
autocovariance of order       0 can then be written as
(2.44)            E[xt- xt] = E[xt- ( xt-1 + ut –                                  ut-1)],
which leads to
                     (0) =        (1) + E[xtut] –                  E[xtut-1]
                                           2                                                 2
for = 0. Due to (2.43), E[xtut] =              and E[xtut-1] = ( – )                             . Thus, we can
write
                                                                                     2
(2.45)                (0) =          (1) + (1 – ( – ))                                   .
(2.44) leads to
                   (1) =        (0) + E[xt-1ut] –                  E[xt-1ut-1]
for = 1. Because of (2.43) this can be written as
                                                                   2
(2.46)                         (1) =             (0) –                 .
If we insert (2.46) in (2.45) and solve for (0), the resulting variance of the
ARMA(1,1) process is
                                                     2
                                       1                   2           2
(2.47)                       (0) =                         2
                                                                           .
                                                 1
Inserting this into (2.46), we get
                                       (             )(1           )           2
(2.48)                     (1) =                           2
                                                 1
72      Univariate Stationary Processes

for the first order autocovariance. For                   2, (2.44) results in the autoco-
variances
(2.49)                               ( ) =            ( -1)
and the autocorrelations
(2.50)                               ( ) =            ( -1) .
This results in the same difference equation as in an AR(1) process but,
however, with the different initial condition
                                          (       )(1           )
                             (1) =                2
                                                                    .
                                              1           2
The first order autocorrelation coefficient is influenced by the MA term,
while the higher order autocorrelation coefficients develop in the same
way as in an AR(1) process.
   If the process is stable and invertible, i.e. for | | < 1 and | | < 1, the sign
of (1) is determined by the sign of ( – ) because of (1 + 2 – 2 ) > 0
and (1 – ) > 0. Moreover, it follows from (2.49) that the autocorrelation
function – as in the AR(1) process – is monotonic for > 0 and oscillating
for < 0. Due to | | < 1 with increasing, the autocorrelation function also
decreases in absolute value.
   Thus, the following typical autocorrelation structures are possible:
(i)      > 0 and     > : The autocorrelation function is always positive.
(ii)      < 0 and < : The autocorrelation function oscillates; the initial
        condition (1) is negative.
(iii)     > 0 and     < : The autocorrelation function is negative from (1)
        onwards.
(iv)      < 0 and > : The autocorrelation function oscillates; the initial
        condition (1) is positive.
Figure 2.9 shows the development of the corresponding autocorrelation
functions up to = 20 for the parameter values ,        {0.8, 0.5, -0.5, -0.8}
in which, of course,        must always hold, as otherwise the ARMA(1,1)
process degenerates to a pure random process.
   For the partial autocorrelation function we get
                             (        )(1         )
            11   =   (1) =            2
                                                      ,
                                 1            2
2.3 Mixed Processes        73




              1
            0.8
            0.6
            0.4
            0.2
              0
           -0.2           5          10          15         20
           -0.4
           -0.6
           -0.8
             -1


              1
            0.8
            0.6
            0.4
            0.2
              0
           -0.2           5          10          15         20
           -0.4
           -0.6
           -0.8
             -1


              1
            0.8
            0.6
            0.4
            0.2
              0
           -0.2           5          10          15         20
           -0.4
           -0.6
           -0.8
             -1


              1
            0.8
            0.6
            0.4
            0.2
              0
           -0.2           5          10          15         20
           -0.4
           -0.6
           -0.8
             -1



Figure 2.9: Theoretical autocorrelation functions of ARMA(1,1) processes
74    Univariate Stationary Processes


                    1        (1)
                    (1)      (2)          (2) (1)2                     (1)(   (1))
          22   =                 =                 =                          2
                                                                                   ,
                    1        (1)          1 (1)2                        1 (1)
                    (1)      1
because of (2) =          (1),
                    1        (1)    (1)                      1         (1)           (1)
                    (1)      1     (2)                       (1)       1              (1)
                                                                                   2
                    (2)      (1)   (3)                        (1)      (1)             (1)
          33   =                       =
                    1        (1)   (2)   1 2                    (1)3          (1) 2 (2       2
                                                                                                 )
                    (1)      1     (1)
                    (2)      (1)    1

                            (1)(      (1)) 2
               =                                            , etc.
                   1 2      (1)3    (1) 2 (2        2
                                                        )
Thus, the ARMA(1,1) process is a stationary stochastic process where nei-
ther the autocorrelation nor the partial autocorrelation function breaks off.
   The following example shows how, due to measurement error, an
AR(1)-process becomes an ARMA(1,1) process.

Example 2.14
The ‘true’ variable x t is generated by a stationary AR(1) process,

(E2.8)                             xt =        xt   1       + ut ,

but it can only be measured with an error vt, i.e. for the observed variable xt it
holds that
(E2.9)                              xt = x t + vt ,

where vt is a pure random process uncorrelated with the random process ut. (The
same model was used in Example 2.3 but with a different interpretation.) If we
transform (E2.8) to
                                                    ut
                                     xt   =
                                               1            L
and insert it into (E2.9) we get
                           (1 – L) xt = ut + vt –                    vt-1 .
2.3 Mixed Processes                    75

For the combined error term         t    = ut + vt –                 vt-1 we get
                                                         2                         2      2
                                   (0) =                 u       + (1 +             )     v

                                                                     2
                                   (1) = -                           v

                                   ( ) = 0 for                                     2,
or
                                             2
                                             v
                  (1) =       2                  2           2
                                                                 ,            ( ) = 0 for                   2.
                              u      (1              )       v


Thus, the observable variable xt follows an ARMA(1,1) process,
                                  (1 –     L) xt = (1 – L)                               t   ,
where can be calculated by means of (1) and t is a pure random pro-
cess. (See also the corresponding results in Section 2.2.1.)


2.3.2 ARMA(p,q) Processes

The general autoregressive moving average process with AR order p and
MA order q can be written as
(2.51) xt =       +     1   xt-1 + ... +                 p       xt-p + ut –                     1   ut-1 – ... –     q    ut-q ,
with ut being a pure random process and                                        p       0 and           q   0 having to hold.
Using the lag operator, we can write
                                      p                                                                       q
(2.51')    (1 –   1L   – ... –      pL )    xt =                         + (1 –          1L      – ... –    qL )    ut ,
or
(2.51'')                            (L) xt =                             +     (L) ut .
As factors that are common in both polynomials can be reduced, (L) and
  (L) cannot have identical roots. The process is stationary if – with sto-
chastic initial conditions – the stability conditions of the AR term are ful-
filled, i.e. if (L) only has roots that are larger than 1 in absolute value.
Then we can derive the Wold representation for which
                         (L) =             (L)(1 +                       1L   +    2    L2 + ... )
must hold. Again, the j, j = 1, 2, ..., can be calculated by comparing the
coefficients. If, likewise, all roots of (L) are larger than 1 in absolute val-
ue, the ARMA(p,q) process is also invertible.
  A stationary and invertible ARMA(p,q) process may either be repre-
sented as an AR( ) or as an MA( ) process. Thus, neither its autocorrela-
76    Univariate Stationary Processes

tion nor its partial autocorrelation function breaks off. In short, it is possi-
ble to generate stationary stochastic processes with infinite AR and MA
orders by using only a finite number of parameters.
   Under the assumption of stationarity, (2.51) directly results in the con-
stant mean

                               E[xt] =                =                                .
                                                          1       1               p


If, without loss of generality, we set = 0 and thus also                                              = 0, we get the
following relation for the autocovariances:
     ( ) = E[xt- xt]
             = E[xt- (     1   xt-1 + ... +           p   xt-p + ut –             1   ut-1 – ... –          q   ut-q)] .
This relation can also be written as
          ( ) =     1 ( -1) +   2 ( -2) + ... +                               p       ( -p)
                       + E[xt- ut] –          1   E[xt- ut-1] – ... –                      q   E[xt- ut-q] .
Due to the Wold representation, the covariances between xt- and ut-i, i = 0,
..., q, are zero for > q, i.e. the autocovariances for > q and > p are gen-
erated by the difference equation of an AR(p) process,
     ( ) –     1   ( -1) –       2   ( -2) – ... –            p   ( -p) = 0 for > q                               >p
whereas the first q autocovariances are also influenced by the MA part.
Normalisation with (0) leads to exactly the same results for the autocorre-
lations.
   If the orders p and q are given and the distribution of the white noise
process ut is known, the parameters of an ARMA(p,q) process can be esti-
mated consistently by using maximum likelihood methods. These esti-
mates are also asymptotically efficient. If there is no such programme
available, it is possible to estimate the parameters consistently with least
squares. As every invertible ARMA(p,q) process is equivalent to an
AR( ) process, first of all an AR(k) process is estimated with k sufficient-
ly larger than p. From this, one can get estimates of the non-observable re-
siduals ût. By employing these residuals, the ARMA(p,q) process can be
estimated with the least squares method,
      xt =         +   1   xt-1 + ... +           p   xt-p –          1   ût-1 – ... –            q   ût-q + vt .
This approach can also be used if p and q are unknown. These orders can,
for example, be determined by using the information criteria shown in Sec-
tion 2.1.5.
2.3 Mixed Processes     77



         Percent
            8
            7
            6
            5
            4
            3
            2
            1
            0                                                              year
                1994       1996       1998        2000        2002
                       a) New York three month money market rate,
                          1994 – 2003
           1
         0.8
         0.6
         0.4
         0.2
           0
         -0.2               5             10             15           20
         -0.4
         -0.6
         -0.8            b) Autocorrelation (__) and partial ( )
           -1               autocorrelation functions of the first
                            differences with confidence intervals
           ˆ

           1
         0.8
         0.6
         0.4
         0.2
           0
         -0.2               5            10              15           20
         -0.4
         -0.6
         -0.8          c) Autocorrelation function of the residuals
          -1              of the estimated ARMA(1,1)-process
                          with confidence intervals



Figure 2.10: Three month money market rate in New York, 1994 – 2003
78    Univariate Stationary Processes

Example 2.15
Figure 2.10 shows the development of the US three month money market rate
(USR) as well as the estimated autocorrelation and partial autocorrelation function
of the first differences of this time series for the period from March 1994 to Au-
gust 2003 (114 observations). Both functions do not show a clear break-off behav-
iour. Therefore, the following ARMA(1,1) model has been estimated for this time
series:
             USRt =      – 0.006 + 0.831 USRt-1 + ût – 0.457 ût-1,.
                         (-0.73) (10.91)              (-3.57)
            R 2 = 0.351, SE = 0.166, Q(10) = 7.897 (p = 0.639).
The AR(1) as well as the MA(1) terms are different from zero and from one at any
usual significance level. The autocorrelogram of the estimated residuals, which is
also given in Figure 2.10, as well as the Ljung-Box Q statistic, which is calculated
for this model with 12 autocorrelation coefficients (i.e. with 10 degrees of free-
dom), do not provide any evidence of a higher order process.


2.4 Forecasting

As mentioned in the introduction, in the 1970’s, one of the reasons for the
broad acceptance of time series analysis using the Box-Jenkins approach
was the fact that forecasts with this comparably simple method often out-
performed forecasts generated by large econometric models. In the follow-
ing, we show how ARMA models can be used for making forecasts about
the future development of time series. In doing so, we assume that all ob-
servations of the time series up to time t are known.


2.4.1 Forecasts with Minimal Mean Squared Errors

We want to solve the problem of making a -step ahead forecast for xt with
a linear prediction function, given a stationary and/or invertible data gen-
erating process.
        ˆ                                                    ˆ
   Let x t ( ) be such a prediction function for xt+ . Thus, x t ( ) is a random
variable for given t and . As all stationary ARMA processes have a Wold
representation, we assume the existence of such a representation without
loss of generality. Thus,
                                                                2
                xt =    +         j   ut j ,   0   = 1,         j   <   ,
                            j 0                           j 0
2.4 Forecasting                    79

where ut is a pure random process with the usual properties E[ut] = 0,
                                                                        2
                                                                                    for t s
                                  E[utus] =                                                 .
                                                                       0            for t s

Therefore, it also holds that

(2.52)                     xt+    =             +                  j   ut           j   ,               = 1, 2, ... .
                                                         j 0


For a linear prediction function with the information given up to time t, we
assume the following representation

(2.53)                    ˆ
                          xt ( ) =                  +                  k   ut       k       ,            = 1, 2, ... ,
                                                           k 0


where the        k   , k = 0, 1, 2, ...,                 = 1, 2, ..., are unknown. The forecast error
                                       ˆ
of a -step forecast is ft( ) = xt+ – x t ( ), = 1, 2, ..., . In order to make a
good forecast, these errors should be small. The expected quadratic fore-
cast error E[(xt+ – x t ( ))2], which should be minimised, is used as the cri-
                    ˆ
terion to determine the unknowns                                           k    . Taking into account (2.52) and
(2.53) we can write
                                                                                                    2
          2
  E [ f ( )] = E
         t                             j   ut        j                          k   ut      k
                                 j 0                               k 0


                                                                                                                                                 2

                     = E         ut             1u t           1                                1u t     1         (    k       k )u t   k
                                                                                                                                                     .
                                                                                                             k 0


From this it follows that
                                                                                                                                         2
                      2                          2                              2               2            2
(2.54)        E [ f ( )] =
                     t                 1         1                                  1                                       k      k         .
                                                                                                                 k 0


The variance of the forecast error reaches its minimum if we set k = +k
for k = 0, 1, 2, ..., . Thus, we get the optimal linear prediction function for
a -step ahead forecast from (2.53), as

(2.55)                    ˆ
                          xt ( ) =              +                           k   ut k ,                   = 1, 2, ... .
                                                         k 0
80     Univariate Stationary Processes

For the conditional expectation of ut+s, given ut, ut-1, …, it holds that
                                                   ut s    for s             0
                   E[ut+s|ut, ut-1, ...] =                                     .
                                                    0      for s             0
Thus, we get the conditional expectation of xt+ , because of (2.52), as

                    E[xt+ |ut, ut-1, ...] =          +                k   ut k .
                                                          k 0

Due to (2.55), the conditional expectation of xt+ , with all information
available at time t given, is identical to the optimal prediction function.
This leads to the following result: The conditional expectation of xt+ , with
all information up to time t given, provides the -step forecast with mini-
mal mean squared prediction error.
   With (2.52) and (2.55) the -step forecast error can be written as
                     ˆ
(2.56) ft( ) = xt+ – x t ( ) = ut+ +                1ut+ -1   +   2ut+ -2      + ... +      -1ut+1

with
                     E[ft( )|ut, ut-1, ...] = E[ft( )] = 0 .
From these results we can immediately draw some conclusions:
1. Best linear unbiased predictions (BLUP) of stationary ARMA process-
   es are given by the conditional expectation for xt+ , = 1,2, …
                      ˆ
                      x t ( ) = E[xt+ |xt, xt-1, ...] = Et[xt+ ] .
2. For the one-step forecast errors ( = 1), ft(1) = ut+1, we get
                          E[ft(1)] = E[ut+1] = 0, and
                                                                  2
                                                                           for t s
                E[ft(1)fs(1)] = E[ut+1us+1] =                                      .
                                                                  0        for t s

     The one-step forecast errors are a pure random process; they are identi-
     cal with the residuals of the data generating process. If the one-step
     prediction errors were correlated, the prediction could be improved by
     using the information contained in the prediction errors. In such a case,
                ˆ
     however, x t (1) would not be an optimal forecast.
3. For the -step forecast errors ( > 1) we get
                 ft( ) = ut+ +       1ut+ -1   +     2ut+ -2    + ... +        -1ut+1   ,
2.4 Forecasting          81


   i.e. they follow a MA( -1) process with E[ft( )] = 0 and the variance
                                              2                    2           2
   (2.57)            V[ft( )] =     1         1                        1            .

   This variance can be used for constructing confidence intervals for -
   step forecasts. However, these intervals are too narrow for practical ap-
   plications because they do not take into account the uncertainty in the
   estimation of the parameters i, i = 1, 2, ..., -1.
4. It follows from (2.57) that the forecast error variance increases mono-
   tonically with increasing forecast horizon :
                             V[ft( )]         V[ft( -1)] .
5. Due to (2.57) we get for the limit
                                2                 2            2           2             2
    lim V[ft( )] = lim 1        1                     1                =                 j   = V[xt] ,
                                                                                   j 0


   i.e. the variance of the -step forecast error is not larger than the vari-
   ance of the underlying process.
6. The following variance decomposition follows from (2.55) and (2.56):
   (2.58)                            ˆ
                        V[xt+ ] = V[ x t ( )] + V[ft( )] .
7. Furthermore,

                ˆ
            lim x t ( ) = lim                     k   ut   k           =           = E[xt] ,
                                        k 0


   i.e. for increasing forecast horizons, the forecasts converge to the (un-
   conditional) mean of the series.
The concept of ‘weak’ rational expectations whose information set is re-
stricted to the current and past values of a variable exactly corresponds to
the optimal prediction approach used here.


2.4.2 Forecasts of ARMA(p,q) Processes

The Wold decomposition employed in the previous section has advantages
when it comes to the derivation of theoretical results, but it is not practical-
ly useful for forecasting. Thus, in the following, we will discuss forecasts
directly using AR, MA, or ARMA representations.
82    Univariate Stationary Processes

Forecasts with a Stationary AR(1) Process

For this process, it holds that
                                 xt =        +       xt-1 + ut ,
with | | < 1. The optimal -step forecast is the conditional mean of xt+ , i.e.
            Et[xt+ ] = Et[ +            xt+ -1 + ut+ ] =                        +       Et[xt+ -1] .
Due to the first conclusion, we get the following first order difference
equation for the prediction function
                          ˆ
                          xt ( ) =               +            ˆ
                                                              x t ( -1) ,
which can be solved recursively:
            ˆ
       = 1: x t (1) =         +        ˆ
                                       x t (0) =                   +        xt
                                                                                        2
            ˆ
       = 2: x t (2) =         +        ˆ
                                       x t (1) =                   +                +       xt


                                                      -1
              ˆ
              xt ( ) =    (1 +         + ... +             ) +             xt

                         1
             ˆ
             xt ( ) =                    +       xt =                               +       (xt –               ).
                         1                                         1                                    1
As   = /(1 – ) is the mean of a stationary AR(1) process,
             ˆ
             xt ( ) =        +                          ˆ
                                       (xt – ) with lim x t ( ) =                                       ,

i.e., with increasing forecast horizon , the predicted values of an AR(1)
process converge geometrically to the unconditional mean of the pro-
cess. The convergence is monotonic if is positive, and oscillating if is
negative.
   To calculate the -step prediction error, the Wold representation, i.e. the
MA( ) representation of the AR(1) process, can be used,
                                                          2                     3
              xt =       + ut +          ut-1 +               ut-2 +                ut-3 + ... .
Due to (2.56) and (2.57) we get the MA( -1) process
                                                     2                                       -1
             ft( ) = ut+ +             ut+ -1 +           ut+ -2 + ... +                             ut+1
for the forecast error with the variance
                                                                                                 2
                                   2                 2(       1)       2                1                   2
           V[ft( )] =     1                                                 =                    2
                                                                                                            .
                                                                                        1
2.4 Forecasting        83

With increasing forecast horizons, it follows that
                                                         2
                        lim V[ft( )] =                       2
                                                                      = V[xt] ,
                                                 1
i.e. the prediction error variance converges to the variance of the AR(1)
process.

Forecasts with Stationary AR(p) Processes

Starting with the representation
               xt =       +   1   xt-1 +     2   xt-2 + ... +                p   xt-p + ut ,
the conditional mean of xt+ is given by
              Et[xt+ ] =       +     1   Et[xt+ -1] + ... +                  p   Et[xt+ -p] .
Here,
                                            ˆ
                                            x t (s) for s 0
                          Et[xt+s] =                        .
                                            x t s for s 0

Thus, the above difference equation can be solved recursively:
               ˆ
          = 1: x t (1) =       +     1   xt +        2   xt-1 + ...+             p   xt+1-p

               ˆ
          = 2: x t (2) =       +     1   ˆ
                                         x t (1) +               2   xt + ... +         p   xt+2-p , etc.

Forecasts with an Invertible MA(1) Process

For this process, it holds that
                               xt =          + ut –                   ut-1
with | | < 1. The conditional mean of xt+ is
                      Et[xt+ ] =          + Et[ut+ ] –                  Et[ut+ -1] .
For = 1, this leads to
(2.59)                             ˆ
                                   x t (1) =                 –        ut ,
and for       2, we get
                                         ˆ
                                         xt ( ) =                ,
i.e. the unconditional mean is the optimal forecast of xt+ , = 2, 3, ..., . For
the -step prediction errors and their variances we get:
84   Univariate Stationary Processes

                                                                     2
           ft(1) = ut+1,                      V[ft(1)] =
                                                                                2           2
           ft(2) = ut+2 –         ut+1,       V[ft(2)] = (1 +                       )


                                                                                2           2
           ft( ) = ut+ –       ut+ -1,        V[ft( )] = (1 +                       )           .
To be able to perform the one-step forecasts (2.59), the unobservable vari-
able u has to be expressed as a function of the observable variable x. To do
this, it must be taken into account that for s t, the one-step forecast errors
can be written as
(2.60)                                      ˆ
                                  us = xs – x s 1 (1).
For t = 0, we get from (2.59)
                                  ˆ
                                  x0 (1) =           –         u0
with the non-observable but fixed u0. Taking (2.60) into account, we get
for t = 1
        ˆ
        x1 (1) =  – u1 =                 ˆ
                                – (x1 – x 0 (1))
                              =           –   x1 +        ( –            u0)
                                                                    2
                              =      (1 + ) –             x1 –           u0 .
Correspondingly, we get for t = 2
      ˆ
      x 2 (1) =   – u2 =          –                  ˆ
                                               (x2 – x1 (1))
                                                                                                         2
                               =          –    x2 +       ( (1 + ) –                        x1 –              u0)
                                                     2                          2                   3
                               =      (1 +      +        ) –        x2 –                x1 –            u0 .
If we continue this procedure, the so-called backcasting, we finally arrive
at a representation of the one-step prediction which – except for u0 – con-
sists only of observable terms,
     ˆ
     x t (1) =   (1 +      + ... + t) –        xt –       2
                                                              xt-1 – ... –              t
                                                                                            x1 –        t+1
                                                                                                              u0 .
Due to the invertibility of the MA(1) process, i.e. for | | < 1, the impact of
the unknown initial value u0 finally disappears.
   Similarly, one can show that, after q forecast steps, the optimal forecasts
of invertible MA(q) processes, q > 1 are equal to the unconditional mean
of the process and that the variance of the forecast errors is equal to the
variance of the underlying process. The forecasts in observable terms are
represented similarly to those of the MA(1) process.
2.4 Forecasting      85

Forecasts with ARMA(p,q) Processes

Forecasts for these processes result from combining the approaches of pure
AR and MA processes. Thus, the one-step ahead forecast for a stationary
and invertible ARMA(1,1) process is given by
                             ˆ
                             x t (1) =       +           xt –    ut.
Starting with t = 0 and taking (2.60) into account, forecasts are successive-
ly generated by backcasting. We first get
                             ˆ
                             x0 (1) =       +            x0 –    u0,
where x0 and u0 are assumed to be any fixed numbers. For t = 1 we get
         ˆ
         x1 (1) =        +         x1 –   u1 =            +      x1 –                ˆ
                                                                               (x1 – x 0 (1))
                                                                           2
                =    (1 + ) + ( – ) x1 +                         x0 –          u0 ,
which finally leads to
(2.61)   ˆ
         x t (1) =   (1 +           + ... + t) + ( – ) xt +                ( – ) xt-1 + ...
                             t-1                     t            t+1
                     +             ( – ) x1 +             x0 –          u0 .
Due to the invertibility condition, i.e. for | | < 1, the one-step forecast for
large values of t does no longer depend on the unknown initial values x0
and u0.
   For the -step forecast, = 2, 3, ..., we get
                               ˆ
                               x t (2) =         +         ˆ
                                                           x t (1)
                               ˆ
                               x t (3) =         +         ˆ
                                                           x t (2)


Using (2.61), these forecasts can be calculated recursively.


2.4.3 Evaluation of Forecasts

Forecasts can be evaluated ex post, i.e. when the realised values are avail-
able. There are many kinds of measures to do this. Quite often, only graphs
and/or scatter diagrams of the predicted values and the corresponding ob-
served values of a time series are plotted. Intuitively, a forecast is good’ if
the predicted values describe the development of the series in the graphs
relatively well or if the points in the scatter diagram are concentrated
around the angle bisecting line in the first and/or third quadrant. Such intu-
86   Univariate Stationary Processes

itive arguments are, however, not founded on the above-mentioned consid-
erations on optimal predictions. For example, as (2.59) shows, the optimal
one-step forecast of a MA(1) process is a pure random process. This im-
plies that the graphs compare two quite different processes. Conclusion 6
given above states that the following decomposition holds for the vari-
ances of the data generating processes, the forecasts and the forecast er-
rors,
                                  ˆ
                     V[xt+ ] = V[ x t ( )] + V[ft( )] .
Thus, it is obvious that predicted and realised values are generally generat-
ed by different processes.
  As a result, a measure for the predictability of stationary processes can
be developed. It is defined as follows,
                                    ˆ
                                  V[x t ( )]       V[f t ( )]
(2.62)             P( )2 =                   = 1 –            ,
                                  V[x t ]          V[x t ]
with 0 P( )2 1. At the same time, P( )2 is the correlation coefficient be-
tween the predicted and the realised values of x. The optimal forecast of a
pure random process with mean zero is x t ( ) = 0, i.e. P( )2 = 0. Such a
                                          ˆ
process cannot be predicted. On the other hand, for the one-step forecast of
a MA(1) process, we can write
                                   2       2               2
                P(1)2 =                2       2
                                                   =           2
                                                                        > 0.
                             (1            )           1
However, the decomposition (2.58), theoretically valid for optimal fore-
casts, does not hold for actual (empirical) forecasts, even if they are gener-
ated by using (estimated) ARMA processes. This is due to the fact that
forecast errors are hardly ever totally uncorrelated with the forecasts.
Therefore, the value of P( )2 might even become negative for bad’ fore-
casts.
   JACOB MINCER and VICTOR ZARNOWITZ (1969) made the following
suggestion to check the consistency of forecasts. By using OLS the follow-
ing regression equation is estimated
(2.63)                 xt+              ˆ
                              = a0 + a1 x t ( ) +              t+   .
It is tested either individually with t tests or commonly with a F test
whether a0 = 0 and a1 = 1. If this is fulfilled, the forecasts are said to be
consistent. However, such a regression produces consistent estimates of
                              ˆ
the parameters if and only if x t ( ) and t+ are asymptotically uncorrelated.
2.4 Forecasting      87

Moreover, to get consistent estimates of the variances, which is necessary
for the validity of the test results, the residuals have to be pure random
processes. Even under the null hypothesis of optimal forecasts, this only
holds for one-step predictions. Thus, the usual F and t tests can only be
used for = 1. For > 1, the MA( -1) process of the forecast errors has to
be taken into account when the variances are estimated. A procedure for
such situations combines Ordinary Least Squares for the estimation of the
parameters and Generalised Least Squares for the estimation of the vari-
ances, as proposed by BRYAN W. BROWN and SHLOMO MAITAL (1981).
   JINOOK JEONG and GANGADHARRAO S. MADDALA (1991) have pointed
out another problem which is related to these tests. Even rational forecasts
are usually not without errors; they contain measurement errors. This im-
plies, however, that (2.63) cannot be estimated consistently with OLS; an
instrumental variables estimator must be used. An alternative to the esti-
mation of (2.63) is therefore to estimate a univariate MA( -1) model for
the forecast errors of a -step prediction,
          ˆ
          f t( ) = a0 + ut + a1 ut-1 + a2 ut-2 + ... + a            -1   ut- +1 ,
and to check the null hypothesis H0: a0 = 0 and whether the estimated re-
siduals ût are white noise.
   On the other hand, simple descriptive measures, which are often em-
ployed to evaluate the performance of forecasts, are based on the average
values of the forecast errors over the forecast horizon. The simple arithme-
tic mean indicates whether the values of the variable are – on average –
over- or underestimated. However, the disadvantage of this measure is that
large over- and underestimates cancel each other out. The mean absolute
error is often used to avoid this effect. Starting the forecasts from a fixed
point of time, t0, and assuming that realisations are available up to t0+m,
we get
                                      m
                              1
         MAE( ) =                              f t0 j ( ) ,     = 1, 2, ... .
                        m     1        j 0


Every forecast error gets the same weight in this measure. The root mean
square error is often used to give particularly large errors a stronger
weight:
                                      m
                               1
           RMSE( ) =                         f t2 j ( ) ,     = 1, 2, ... .
                             m 1      j 0
                                                0




These measures are not normalised, i.e. their size depends on the scale of
the data.
88    Univariate Stationary Processes

   The inequality measure proposed by HENRY THEIL (1961) avoids this
problem by comparing the actual forecasts with so-called naïve forecasts,
i.e. the realised values of the last available observation,
                                     m
                                            f t2 j ( )
                                               0
                                      j 0
               U( ) =        m
                                                               ,   = 1, 2, ... .
                                   (x t 0     j    x t0 j )2
                             j 0


                                                            ˆ
If U( ) = 1, the forecast is as good as the naïve forecast, x t ( ) = xt. For
U( ) < 1 the forecasts perform better than the naïve one. MAE, RMSE and
Theil’s U all become zero if predicted and realised values are identical
over the whole forecast horizon.

Example 2.16
All these measures can also be applied to forecasts which are not generated by
ARMA models, as, for example, the forecasts of the Council of Economic Experts
or the Association of German Economic Research Institutes. Since the end of the
1960’s, both institutions have published forecasts of the German economic devel-
opment for the following year, the institutes usually in October and the Council at
the end of November. HANNS MARTIN HAGEN and GEBHARD KIRCHGÄSSNER
(1996) investigated the annual forecasts of the growth rates of GNP for the period
from 1970 to 1995 as well as for the sub-periods from 1970 to 1982 and from
1983 to 1995. These periods correspond to the social-liberal government of SPD
and FDP and the conservative-liberal government of CDU/CSU and FDP.
   The results are given in Table 2.2. Besides the criteria given above, the table al-
so indicates the square of the correlation coefficient between realised and predict-
ed values (R2), the estimated regression coefficient â1 of the test equation (2.63) as
well as the mean error (ME). According to almost all criteria, the forecasts of the
Council outperform those of the institutes. This was to be expected, as the Coun-
cil’s forecasts are produced slightly later, at a time when more information is
available. It holds for the forecasts of both institutions that the mean absolute er-
ror, the root mean squared error as well as Theil's U are smaller in the second pe-
riod compared to the first one. This is some evidence that the forecasts might have
improved over time. On the other hand, the correlation coefficient between pre-
dicted and realised values has also become smaller. This indicates a deterioration
of the forecasts. It has to be taken into account that the variance of the variable to
be predicted was considerably smaller in the second period as compared to the
first one. Thus, the smaller errors do not necessarily indicate improvements of the
forecasts. It is also interesting to note that on average the forecast errors of both
institutions were negative in the first and positive in the second sub-period. They
tended to overestimate the development in the period of the social-liberal coalition
and to underestimate it in the period of the conservative-liberal coalition.
2.5 The Relation between Econometric Models and ARMA Processes                  89


              Table 2.2: Forecasts of the Council of Economic Experts
                         and of the Economic Research Institutes

                  Period        R2       RMSE       MAE       ME        â1       U

               1970 – 1995    0.369      1.838      1.346    -0.250*   1.005*   0.572
 Institutes    1970 – 1982    0.429      2.291      1.654    -0.731    1.193*   0.625
               1983 – 1995    0.399      1.229      1.038    0.231     1.081    0.457

               1970 – 1995    0.502*     1.647*     1.171*   -0.256    1.114    0.512*
 Council of
 Economic      1970 – 1982    0.599*     2.025*     1.477*   -0.723*   1.354    0.552*
  Experts
               1983 – 1995    0.472*     1.150*     0.865*   0.212*    1.036*   0.428*

   ‘*’ denotes the ‘better’ of the two forecasts.




2.5 The Relation between Econometric Models and
    ARMA Processes

The ARMA model-based forecasts discussed in the previous section are
unconditional forecasts. The only information that is used to generate the-
se forecasts is the information contained in the current and past values of
the time series. There is demand for such forecasts, and – as mentioned
above – one of the reasons for the development and the popularity of the
Box-Jenkins methodology presented in this chapter is that by applying the
above-mentioned approaches, these predictions perform – at least partly –
much better than forecasts generated by large scale econometric models.
Thus, the Box-Jenkins methodology seems to be a (possibly much better)
alternative to the traditional econometric methodology.
   However, this perspective is rather restricted. On the one hand, condi-
tional rather than unconditional forecasts are required in many cases, for
example, in order to evaluate the effect of a tax reform on economic
growth. Such forecasts cannot be generated by using (only) univariate
models. On the other hand, and more importantly, the separation of the two
approaches is much less strict than it seems to be at first glance. As
ARNOLD ZELLNER and FRANZ C. PALM (1974) showed, linear dynamic
simultaneous equation systems as used in traditional econometrics can be
transformed into ARMA models. (Inversely, multivariate time series mod-
els as discussed in the next chapters can be transformed into traditional
econometric models.) The univariate ARMA models correspond to the fi-
90     Univariate Stationary Processes

nal equations of econometric models in the terminology of JAN TINBER-
GEN (1940).
  Let us consider a very simple model. An exogenous, weakly stationary
variable x, as defined in (2.64b), has a current and lagged impact on the
dependent variable y, while the error term might be autocorrelated. Thus,
we get the model
(2.64a)                   yt =     1(L)   xt +    2(L)    u1,t ,
(2.64b)                        (L) xt =       (L) u2,t ,
where 1(L) and 2(L) are lag polynomials of finite order. If we insert
(2.64b) in (2.64a), we get for y the univariate model
(2.64a')                         (L) yt =        (L) vt
with
                  (L) vt :=      1(L)    (L) u2,t +   2(L)         (L) u1,t .
As (L)vt is an MA process of finite order, we get a finite order ARMA
representation for y. It must be pointed out that the univariate representa-
tions of the two variables have the same finite order AR term.


References

Since the time when HERMAN WOLD developed the class of ARMA processes in
his dissertation and GEORGE E.P. BOX and GWILYM M. JENKINS (1970) popular-
ised and further developed this model class in the textbook mentioned above, there
have been quite a lot of textbooks dealing with these models at different technical
levels. An introduction focusing on empirical applications is, for example, to be
found in
ROBERT S. PINDYCK and DANIEL L. RUBINFELD, Econometric Models and Eco-
   nomic Forecasts, McGraw-Hill, Boston et al., 4th edition 1998, Chapter 17f.
   pp. 521 – 578,
PETER J. BROCKWELL and RICHARD A. DAVIS, Introduction to Time Series and
   Forecasting, Springer, New York et al. 1996, as well as
TERENCE C. MILLS, Time Series Techniques for Economists, Cambridge Universi-
   ty Press, Cambridge (England) 1990. Contrary to this,
PETER J. BROCKWELL and RICHARD A. DAVIS, Time Series: Theory and Methods,
   Springer, New York et al. 1987,
give a rigorous presentation in probability theory. Along with the respective
proofs of the theorems, this textbook shows, however, many empirical examples.
References     91

  Autoregressive processes for the residuals of an estimated regression equation
were used for the first time in econometrics by
DONALD COCHRANE and GUY H. ORCUTT, Application of Least Squares Regres-
  sion to Relationships Containing Autocorrelated Error Terms, Journal of the
  American Statistical Association 44 (1949), pp. 32 – 61.
The different information criteria to detect the order of an autoregressive process
are presented in
HIROTUGU AKAIKE, Fitting Autoregressive Models for Prediction, Annals of the
   Institute of Statistical Mathematics AC-19 (1974), pp. 364 – 385,
HIROTUGU AKAIKE, A New Look at the Statistical Model Identification, IEEE
   Transactions on Automatic Control 21 (1969), pp. 234 – 237,
GIDEON SCHWARZ, Estimating the Dimensions of a Model, Annals of Statistics 6
   (1978), pp. 461 – 464, as well as in
EDWARD J. HANNAN and BARRY G. QUINN, The Determination of the Order of an
   Autoregression, Journal of the Royal Statistical Society B 41 (1979), pp. 190
   – 195.
The effect of temporal aggregation on the first differences of temporal averages
have first been investigated by
HOLBROOK WORKING, Note on the Correlation of First Differences of Averages in
   a Random Chain, Econometrica 28 (1960), pp. 916 – 918
and later on, in more detail, by
GEORGE C. TIAO, Asymptotic Behaviour of Temporal Aggregates of Time Series,
   Biometrika 59 (1972), pp. 525 – 531.
The approach to check the consistency of predictions was developed by
JACOB MINCER and VICTOR ZARNOWITZ, The Evaluation of Economic Forecasts,
   in: J. MINCER (ed.), Economic Forecasts and Expectations, National Bureau
   of Economic Research, New York 1969.
The use of MA processes of the forecast errors to estimate the variances of the es-
timated parameters was presented by
BRYAN W. BROWN and SHLOMO MAITAL, What Do Economists Know? An Em-
   pirical Study of Experts’ Expectations, Econometrica 49 (1981), pp. 491 –
   504.
The fact that measurement errors also play a role in rational forecasts and that,
therefore, instrumental variable estimators should be used, was indicated by
JINOOK JEONG and GANGADHARRAO S. MADDALA, Measurement Errors and Tests
    for Rationality, Journal of Business and Economic Statistics 9 (1991), pp. 431
    – 439.
92    Univariate Stationary Processes

These procedures have been applied to the common forecasts of the German eco-
nomic research institutes by
GEBHARD KIRCHGÄSSNER, Testing Weak Rationality of Forecasts with Different
   Time Horizons, Journal of Forecasting 12 (1993), pp. 541 – 558.
Moreover, the forecasts of the German Council of Economic Experts as well as
those of the German Economic Research Institutes were investigated in
HANNS MARTIN HAGEN and GEBHARD KIRCHGÄSSNER, Interest Rate Based Fore-
   casts of German Economic Growth: A Note, Weltwirtschaftliches Archiv 132
   (1996), pp. 763 – 773.
The measure of inequality (Theil’s U) was proposed by
HENRY THEIL, Economic Forecasts and Policy, North-Holland, Amsterdam 1961.
An alternative measure is given in
HENRY THEIL, Applied Economic Forecasting, North-Holland, Amsterdam 1966.
Today, both measures are used in computer programmes. Quite generally, fore-
casts for time series data are discussed in
CLIVE W.J. GRANGER, Forecasting in Business and Economics, Academic Press,
   2nd edition 1989.
On the evaluation of the predictive accuracy of forecasts see
FRANCIS X. DIEBOLD and ROBERTO S. MARIANO, Comparing Predictive Accuracy,
   Journal of Business and Economic Statistics 13 (1995), pp. 253 – 263.
The relationship between time series models and econometric equation sys-
tems is analysed in
ARNOLD ZELLNER and FRANZ C. PALM, Time Series Analysis and Simultaneous
   Equation Econometric Models, Journal of Econometrics 2 (1974), pp. 17 –
   54.
See for this also
FRANZ C. PALM, Structural Econometric Modeling and Time Series Analysis: An
   Integrated Approach, in: A. ZELLNER (ed.), Applied Time Series Analysis of
   Economic Data, U.S. Department of Commerce, Economic Research Report
   ER-S, Washington 1983, pp. 199 – 230.
The term final equation originates from
JAN TINBERGEN, Econometric Business Cycle Research, Review of Economic
   Studies 7 (1940), pp. 73 – 90.
An introduction into the solution of difference equations is given in
WALTER ENDERS, Applied Econometric Time Series, 3rd edition, Wiley, Hoboken,
  N.J. 2010, Chapter 1.
References   93

The permanent income hypothesis as a determinant of consumption expenditure
was developed by
MILTON FRIEDMAN, A Theory of the Consumption Function, Princeton University
   Press, Princeton N.J. 1957.
The example of the estimated popularity function is given in
GEBHARD KIRCHGÄSSNER, Causality Testing of the Popularity Function: An Em-
   pirical Investigation for the Federal Republic of Germany, 1971 – 1982, Pub-
   lic Choice 45 (1985), pp. 155 – 173.
http://www.springer.com/978-3-642-33435-1

More Related Content

PDF
Student manual
PDF
Tobit Model
PPT
input-output.ppt
PPTX
Macro Economics -II Growth model
PDF
Lesson29 Intro To Difference Equations Slides
PPTX
Dummy variables
PDF
Chapter 6 - Romer Model
PPT
Unit Root Test
Student manual
Tobit Model
input-output.ppt
Macro Economics -II Growth model
Lesson29 Intro To Difference Equations Slides
Dummy variables
Chapter 6 - Romer Model
Unit Root Test

What's hot (20)

DOCX
DUMMY VARIABLE REGRESSION MODEL
PPTX
Applications of differential equations(by Anil.S.Nayak)
PPT
Unit Root Test
PPTX
Forecasting with Vector Autoregression
PDF
Panel Data Models
PDF
Panel slides
PDF
Arch & Garch Processes
PDF
Autocorrelation (1)
PPTX
Eigen valus n vectors
PDF
An Overview of Simple Linear Regression
PPTX
Rules for identification
PPTX
Stat 3203 -pps sampling
PPT
1 marshall hicks-slutsky
PDF
Macroeconomics: Policy Mix
PPTX
Time Series Analysis.pptx
PPTX
R square vs adjusted r square
PPT
Pricipal Component Analysis Using R
PPTX
Identification problem in simultaneous equations model
PPTX
Demand forecasting by time series analysis
DUMMY VARIABLE REGRESSION MODEL
Applications of differential equations(by Anil.S.Nayak)
Unit Root Test
Forecasting with Vector Autoregression
Panel Data Models
Panel slides
Arch & Garch Processes
Autocorrelation (1)
Eigen valus n vectors
An Overview of Simple Linear Regression
Rules for identification
Stat 3203 -pps sampling
1 marshall hicks-slutsky
Macroeconomics: Policy Mix
Time Series Analysis.pptx
R square vs adjusted r square
Pricipal Component Analysis Using R
Identification problem in simultaneous equations model
Demand forecasting by time series analysis
Ad

Similar to Introduction to modern time series analysis (20)

PDF
Quadratic form and functional optimization
PDF
Gaussseidelsor
 
PDF
Financial mgt
PDF
Hw2 s
PDF
Jacobi and gauss-seidel
PDF
Some Dynamical Behaviours of a Two Dimensional Nonlinear Map
PDF
The International Journal of Engineering and Science (The IJES)
PPSX
Pole placement by er. sanyam s. saini (me reg)
PDF
Tutorial 8 mth 3201
PDF
Midsem sol 2013
PDF
Tutorial 9 mth 3201
PDF
Parameter Estimation in Stochastic Differential Equations by Continuous Optim...
PDF
C slides 11
PDF
Toeplitz (2)
DOC
Chapter 4(differentiation)
PDF
Solved problems
PDF
Doering Savov
 
DOCX
Internal assessment
PDF
Engr 213 sample midterm 2b sol 2010
Quadratic form and functional optimization
Gaussseidelsor
 
Financial mgt
Hw2 s
Jacobi and gauss-seidel
Some Dynamical Behaviours of a Two Dimensional Nonlinear Map
The International Journal of Engineering and Science (The IJES)
Pole placement by er. sanyam s. saini (me reg)
Tutorial 8 mth 3201
Midsem sol 2013
Tutorial 9 mth 3201
Parameter Estimation in Stochastic Differential Equations by Continuous Optim...
C slides 11
Toeplitz (2)
Chapter 4(differentiation)
Solved problems
Doering Savov
 
Internal assessment
Engr 213 sample midterm 2b sol 2010
Ad

More from Springer (20)

PDF
The chemistry of the actinide and transactinide elements (set vol.1 6)
PDF
Transition metal catalyzed enantioselective allylic substitution in organic s...
PDF
Total synthesis of natural products
PDF
Solid state nmr
PDF
Mass spectrometry
PDF
Higher oxidation state organopalladium and platinum
PDF
Principles and applications of esr spectroscopy
PDF
Inorganic 3 d structures
PDF
Field flow fractionation in biopolymer analysis
PDF
Thermodynamics of crystalline states
PDF
Theory of electroelasticity
PDF
Tensor algebra and tensor analysis for engineers
PDF
Springer handbook of nanomaterials
PDF
Shock wave compression of condensed matter
PDF
Polarization bremsstrahlung on atoms, plasmas, nanostructures and solids
PDF
Nanostructured materials for magnetoelectronics
PDF
Nanobioelectrochemistry
PDF
Modern theory of magnetism in metals and alloys
PDF
Mechanical behaviour of materials
PDF
Magnonics
The chemistry of the actinide and transactinide elements (set vol.1 6)
Transition metal catalyzed enantioselective allylic substitution in organic s...
Total synthesis of natural products
Solid state nmr
Mass spectrometry
Higher oxidation state organopalladium and platinum
Principles and applications of esr spectroscopy
Inorganic 3 d structures
Field flow fractionation in biopolymer analysis
Thermodynamics of crystalline states
Theory of electroelasticity
Tensor algebra and tensor analysis for engineers
Springer handbook of nanomaterials
Shock wave compression of condensed matter
Polarization bremsstrahlung on atoms, plasmas, nanostructures and solids
Nanostructured materials for magnetoelectronics
Nanobioelectrochemistry
Modern theory of magnetism in metals and alloys
Mechanical behaviour of materials
Magnonics

Introduction to modern time series analysis

  • 1. 2 Univariate Stationary Processes As mentioned in the introduction, the publication of the textbook by GEORGE E.P. BOX and GWILYM M. JENKINS in 1970 opened a new road to the analysis of economic time series. This chapter presents the Box-Jen- kins Approach, its different models and their basic properties in a rather elementary and heuristic way. These models have become an indispensa- ble tool for short-run forecasts. We first present the most important ap- proaches for statistical modelling of time series. These are autoregressive (AR) processes (Section 2.1) and moving average (MA) processes (Section 2.2), as well as a combination of both types, the so-called ARMA process- es (Section 2.3). In Section 2.4 we show how this class of models can be used for predicting the future development of a time series in an optimal way. Finally, we conclude this chapter with some remarks on the relation between the univariate time series models described in this chapter and the simultaneous equations systems of traditional econometrics (Section 2.5). 2.1 Autoregressive Processes We know autoregressive processes from traditional econometrics: Already in 1949, DONALD COCHRANE and GUY H. ORCUTT used the first order au- toregressive process for modelling the residuals of a regression equation. We will start with this process, then treat the second order autoregressive process and finally show some properties of autoregressive processes of an arbitrary but finite order. 2.1.1 First Order Autoregressive Processes Derivation of Wold’s Representation A first order autoregressive process, an AR(1) process, can be written as an inhomogeneous stochastic first order difference equation, (2.1) xt = + xt-1 + ut, G. Kirchgässner et al., Introduction to Modern Time Series Analysis, Springer Texts in Business 27 and Economics, DOI 10.1007/978-3-642-33436-8_2, © Springer-Verlag Berlin Heidelberg 2013
  • 2. 28 Univariate Stationary Processes where the inhomogeneous part + ut consists of a constant term and a pure random process ut. Let us assume that for t = t0 the initial value x t 0 is given. By successive substitution in (2.1) we get x t0 1 = + x t 0 + u t0 1 x t0 2 = + x t0 1 + u t0 2 = + ( + x t 0 + u t0 1 ) + u t0 2 2 = + + x t0 + u t0 1 + u t0 2 x t0 3 = + x t0 2 + u t0 3 2 3 2 x t0 3 = + + + x t0 + u t0 1 + u t0 2 + u t0 3 2 -1 x t0 = (1 + + + … + ) + x t0 -1 -2 + u t0 1 + u t0 2 + … + u t0 1 + u t0 , or 1 1 j x t0 = x t0 + u t0 j . 1 j 0 For t = t0 + , we get t t0 t t0 1 t t0 1 j (2.2) xt = x t0 + ut j . 1 j 0 The development and thus the properties of this process are mainly deter- mined by the assumptions on the initial condition x t 0 . The case of a fixed (deterministic) initial condition is given if x0 is as- sumed to be a fixed (real) number, for example for t0 = 0, i.e. no random variable. Then we can write: t t 1 t 1 j xt = x0 + ut j . 1 j 0 This process consists of time dependent deterministic and stochastic parts. Thus, it can never be weakly stationary, since first and second order mo-
  • 3. 2.1 Autoregressive Processes 29 ments are time dependent. It is, however, asymptotically stationary be- cause the time dependence vanishes for t0 - . We can imagine the case of stochastic initial conditions as (2.1) being generated along the whole time axis, i.e. - < t < . If we observe the process only for positive values of t, the initial value x0 is a random varia- ble which is generated by this process. Formally, the process with stochas- tic initial conditions results from (2.2) if the solution of the homogeneous difference equation has disappeared. This is only possible if | | < 1. There- fore, in the following, we restrict to the interval –1 < < 1. If lim x t 0 is t0 bounded, (2.2) for t0 - converges to j (2.3) xt = ut j . 1 j 0 The time dependence has disappeared. According to Section 1.5, the AR(1) process (2.1) has the Wold representation (2.3) with j = j and | | < 1. This results in the convergence of 2 2j 1 j = = 2 . j 0 j 0 1 Thus, assuming stochastic initial conditions, the process (2.1) is weakly stationary. The Lag Operator Equation (2.3) can also be derived from relation (2.1) by using the lag op- erator defined in Section 1.3: (2.1') (1 – L)xt = + ut . If we solve for xt we get 1 (2.4) xt = + ut . 1 L 1 L The expression 1/(1 – L) can formally be expanded to a geometric series, 1 2 2 3 3 = 1 + L + L + L + … . 1 L Thus, we get 2 2 xt = (1 + L + L + …) + (1 + L + L + …)ut 2 2 = (1 + + + …) + ut + ut-1+ ut-2 + … ,
  • 4. 30 Univariate Stationary Processes and because of | | < 1 j xt = ut j . 1 j 0 The first term could have been derived immediately if we substituted the value ‘1’ for L in the first term of (2.4). (See also relation (1.8) on p. 11). Calculation of Moments Due to representation (2.3), the first and second order moments can be cal- culated. As E[ut] = 0 holds for all t, we get for the mean j E[xt] = E ut j 1 j 0 j E[xt] = E ut j = = 1 j 0 1 i.e. the mean is constant. It is different from zero if and only if 0. Be- cause of 1 – > 0, the sign of the mean is determined by the sign of . For the variance we get 2 2 j V[xt] = E xt = E ut j 1 j 0 2 = E[(ut + ut-1 + ut-2 + ... )2] 2 4 = E[ u 2 + t u2 1 + t u2 t 2 + … + 2 utut-1 + 2 2utut-2 + … ] 2 2 4 = (1 + + + ...), because E[ut us] = 0 for t s and E[ut us] = 2 for t = s. Applying the sum- mation formula for the geometric series, and because of | | < 1, we get the constant variance 2 V[xt] = 2 . 1 The covariances can be calculated as follows: Cov [xt,xt- ] = E xt xt 1 1
  • 5. 2.1 Autoregressive Processes 31 = E[(ut + ut-1 + ... + ut- + ...) 2 (ut- + u t- -1 + u t- -2 + ...)] -1 = E[(ut + ut-1 + ... + ut- +1 2 + (ut- + u t- -1 + u t- -2 + ...)) 2 (ut- + u t- -1 + u t- -2 + ...)] 2 = E[(ut- + ut- -1 + ut- -2 + ... )2] . Thus, we get 2 Cov [xt,xt- ] = V[xt- ] = 2 . 1 The autocovariances are only a function of the time difference and not of time t, and we can write: 2 (2.5) ( ) = 2 , = 0, 1, 2, ... . 1 Therefore, the AR(1) process with | | < 1 and stochastic initial conditions is weakly stationary. An Alternative Method for the Calculation of Moments Under the condition of weak stationarity, i.e. for | | < 1 and stochastic ini- tial conditions, the mean of xt is constant. If we apply the expectation op- erator on equation (2.1), we get: E[xt] = E[ + xt-1 + ut] = + E[xt-1] + E[ut] . Because of E[ut] = 0 and E[xt] = E[xt-1] = for all t we can write E[xt] = = . 1 If we consider the deviations from the mean, x t = xt – and substitute this in relation (2.1), we get: xt + = + xt 1 + + ut . From this it follows that
  • 6. 32 Univariate Stationary Processes xt = + ( – 1) + xt 1 + ut = + ( – 1) + xt 1 + ut 1 (2.6) xt = xt 1 + ut . This is the AR(1) process belonging to (2.1) with E[ x t ] = 0. If we multiply equation (2.6) with x t for 0 and take expectations we can write: (2.7) E[ x t x t ] = E[ x t x t 1 ] + E[ x t ut] . Because of (2.3) we get 2 xt = ut- + ut- -1 + ut- -2 + … . This leads to 2 for 0 (2.8) E[ x t ut] = . 0 for 0 Because of the stationarity assumption and because of the (even) sym- metry of the autocovariances, ( ) = (- ), equation (2.7) results in 2 = 0: E[ x 2 ] t = E[ x t x t 1 ] + , or 2 (0) = (1) + , = 1: E[ x t x t 1 ] = E[ x 2 1 ], t or (1) = (0) . This leads to the variance of the AR(1) process 2 (0) = 2 . 1 For 1 (2.7) implies (1) = (0) 2 (2) = (1) = (0)
  • 7. 2.1 Autoregressive Processes 33 3 (3) = (2) = (0) ( ) = ( -1) = (0) . Thus, the covariances can be calculated from the linear homogeneous first order difference equation ( )– ( -1) = 0 2 2 with the initial value (0) = /(1 – ). The Autocorrelogram Because of ( ) = ( )/ (0), the autocorrelation function (the autocorrelo- gram) of the AR(1) process is (2.9) ( ) = , = 1, 2, ... . This function converges geometrically to zero for , and its infinite sum equals 1/(1 – ) since | | < 1. This convergence is monotone for posi- tive and oscillating for negative values of . Example 2.1 For = 0 and {0.9, 0.5, -0.9}, Figures 2.1 to 2.3 each present one realisation of the corresponding AR(1) process with T = 240 observations. To generate these series, we used realisations of normally distributed pure random processes with mean zero and variance one. We always dropped the first 60 observations to elim- inate the dependence of the initial values. The realisation for = 0.9, presented in Figure 2.1, is relatively smooth. This is to be expected given the theoretical autocorrelation function because random vari- ables with a considerable distance between each other still have high positive cor- relations. The development of the realisation in Figure 2.2 with = 0.5 is much less sys- tematic. The geometric decrease of the theoretical autocorrelation function is ra- ther fast. The fourth order autocorrelation coefficient is only 0.0625. Contrary to this, the realisation of the AR(1) process with = -0.9, presented in Figure 2.3, follows a well pronounced zigzag course with, however, alternating posi- tive and negative amplitudes. This is consistent with the theoretical autocorrelation function indicating that all random variables with even-numbered distance are posi- tively correlated and those with odd-numbered distance negatively correlated.
  • 8. 34 Univariate Stationary Processes xt 7.5 5 2.5 0 t -2.5 -5 -7.5 a) Realisation 1 0.8 0.6 0.4 0.2 0 5 10 15 20 -0.2 -0.4 b) Theoretical autocorrelation function ˆ 1 0.8 0.6 0.4 0.2 0 5 10 15 20 -0.2 -0.4 c) Estimated autocorrelation function with confidence intervals Figure 2.1: AR(1) process with = 0.9
  • 9. 2.1 Autoregressive Processes 35 xt 4 2 0 t -2 -4 a) Realisation 1 0.8 0.6 0.4 0.2 0 -0.2 5 10 15 20 -0.4 -0.6 -0.8 b) Theoretical autocorrelation function -1 ˆ 1 0.8 0.6 0.4 0.2 0 -0.2 5 10 15 20 -0.4 -0.6 -0.8 -1 c) Estimated autocorrelation function with confidence intervals Figure 2.2: AR(1) process with = 0.5
  • 10. 36 Univariate Stationary Processes xt 5 2.5 0 t -2.5 -5 a) Realisation 1 0.8 0.6 0.4 0.2 0 -0.2 5 10 15 20 -0.4 -0.6 -0.8 -1 b) Theoretical autocorrelation function ˆ 1 0.8 0.6 0.4 0.2 0 -0.2 5 10 15 20 -0.4 -0.6 -0.8 -1 c) Estimated autocorrelation function with confidence intervals Figure 2.3: AR(1) process with = -0.9
  • 11. 2.1 Autoregressive Processes 37 It generally holds that the closer the parameter is to +1, the smoother the realisa- tions will be. For negative values of we get zigzag developments which are the more pronounced the closer is to -1. For = 0 we get a pure random process. The autocorrelation functions estimated by means of relation (1.10) with the given realisations are also presented in Figures 2.1 to 2.3. The dotted parallel lines show approximate 95 percent confidence intervals for the null hypothesis assuming that the true process is a pure random process. In all three cases, the estimated func- tions reflect quite well the typical development of the theoretical autocorrelations. Example 2.2 In a paper on the effect of economic development on the electoral chances of the German political parties during the period of the social-liberal coalition from 1969 to 1982, GEBHARD KIRCHGÄSSNER (1985) investigated (besides other issues) the time series properties of the popularity series of the parties constructed by monthly surveys of the Institute of Demoscopy in Allensbach (Germany). For the period from January 1971 to April 1982, the popularity series of the Christian Democrat- ic Union (CDU), i.e. the share of voters who answered that they would vote for this party (or its Bavarian sister party, the CSU) if there were a general election by the following Sunday, is given in Figure 2.4. The autocorrelation and the partial autocorrelation function (which is discussed in Section 2.1.4) are also presented in this figure. While the autocorrelation function goes slowly towards zero, the par- tial autocorrelation function breaks off after = 1. This argues for an AR(1) pro- cess. The model has been estimated with Ordinary Least Squares (OLS), the method proposed in Section 2.1.5 for the estimation of autoregressive models. Thus, we get: CDUt = 8.053 + 0.834 CDUt-1 + ût, (3.43) (17.10) R 2 = 0.683, SE = 1.586, Q(11) = 12.516 (p = 0.326). The estimated t values are given in parentheses, SE denotes the standard error of the residuals. The autocorrelogram, which is also given in Figure 2.4, does not in- dicate any higher-order process. Moreover, given the high p-value, the Ljung-Box Q statistic with 12 correlation coefficients (i.e. with 11 degrees of freedom) gives no reason to reject this model. The mean is calculated as 8.053 ˆ 48.512 . 1 0.834 It shows that about 48.5 percent of the voters voted on average for the CDU dur- ing this period.
  • 12. 38 Univariate Stationary Processes Percent 56 54 52 50 48 46 44 42 40 year 1971 1973 1975 1977 1979 1981 a) Popularity of the CDU/CSU, 1971 – 1982 ˆ( ) 1 0.8 0.6 0.4 0.2 0 -0.2 5 10 15 20 -0.4 -0.6 -0.8 b) Autocorrelation (__) and partial ( ) -1 autocorrelation functions with confidence intervals ˆ( ) 1 0.8 0.6 0.4 0.2 0 -0.2 5 10 15 20 -0.4 -0.6 -0.8 c) Estimated autocorrelation function of the -1 residuals of the estimated AR(1)-process with confidence intervals Figure 2.4: Popularity of the CDU/CSU, 1971 – 1982
  • 13. 2.1 Autoregressive Processes 39 Stability Conditions Along with the stochastic initial value, the condition | | < 1, the so-called stability condition, is crucial for the stationarity of the AR(1) process. We can also derive the stability condition from the linear homogeneous differ- ence equation, which is given for the process itself by xt – xt-1 = 0, for its autocovariances by ( ) – ( -1) = 0 and for the autocorrelations by ( ) – ( -1) = 0. These difference equations have stable solutions, i.e. lim ( ) = 0, if and only if their characteristic equation (2.10) – = 0 has a solution (root) with an absolute value smaller than one, i.e. if | | < 1 holds. We get an equivalent condition if we do not consider the character- istic equation but the lag polynomial of the corresponding difference equa- tions, (2.11) 1 – L = 0. This implies that the solution has to be larger than one in absolute value. (Strictly speaking, L, which denotes an operator, has to be substituted by a variable, which is often denoted by ‘z’. To keep the notation simple, we use L in both meanings.) Example 2.3 Let us consider the stochastic process (E2.1) yt = xt + vt . In this equation, xt is a stationary AR(1) process, xt = xt-1 + ut, with | | < 1; vt is a pure random process with mean zero and constant variance 2 which is uncorre- v lated with the other pure random process ut with mean zero and constant variance 2 u. We can interpret the stochastic process yt as an additive decomposition of two stationary components. Then yt itself is stationary. In the sense of MILTON FRIEDMAN (1957) we can interpret xt as the permanent (systematic) and vt as the transitory component.
  • 14. 40 Univariate Stationary Processes What does the correlogram of yt look like? As both xt and vt have zero mean, E[yt] = 0. Multiplying (E2.1) with yt- and taking expectations results in E[yt- yt] = E[yt- xt] + E[yt- vt] . Due to yt- = xt- + vt- , we get E[yt- yt] = E[xt- xt] + E[vt- xt] + E[xt- vt] + E[vt- vt]. As ut and vt are uncorrelated, it holds that E[vt- xt] = E[xt- vt] = 0, and because of the stationarity of the two processes, we can write (E2.2) y( ) = x( )+ v( ). For = 0 we get the variance of yt as 2 2 u 2 y(0) = x(0) + v = 2 + v . 1 For > 0, because of v( ) = 0 for 0, we get from (E2.2) 2 u y( ) = x( ) = 2 . 1 Thus, we finally get y( ) = 2 2 2 , = 1, 2, ..., 1 (1 ) v / u for the correlogram of yt. The overlay of the systematic component by the transito- ry component reduces the autocorrelation generated by the systematic component. The larger the variance of the transitory component, the stronger is this effect. 2.1.2 Second Order Autoregressive Processes Generalising (2.1), the second order autoregressive process (AR(2)) can be written as (2.12) xt = + 1 xt-1 + 2 xt-2 + ut, 2 with ut denoting a pure random process with variance and 2 0. With the lag operator L we get (2.13) (1 – 1 L– 2 L2) xt = + ut . With (L) = 1 – 1 L– 2 L2 we can write (2.14) (L) xt = + ut .
  • 15. 2.1 Autoregressive Processes 41 As for the AR(1) process, we get the Wold representation from (2.14) if we invert (L); i.e. under the assumption that -1(L) exists and has the property -1 (2.15) (L) (L) = 1 we can ‘solve’ for xt in (2.14): -1 -1 (2.16) xt = (L) + (L) ut . If we use the series expansion with undetermined coefficients for -1 2 (L) = 0 + 1L + 2L + ... it has to hold that 1 = (1 – 1 L– 2 L2 )( 0 + 1L + 2L 2 + 3L 3 + ... ) because of (2.15). This relation is an identity only if the coefficients of Lj, j = 0, 1, 2, ..., are equal on both the right and the left hand side. We get 1 0 1 L L2 2 3 L3 ... 2 3 1 0 L 1 1L 1 2L ... . 2 3 2 0L 2 1L ... Comparing the coefficients of the lag polynomials on the right- and left- hand side finally leads to L0: 0 = 1 L1: 1 – 1 0 = 0 1 = 1. L2: 2 – 1 1 – 2 0 = 0 2 = 2 1 + 2. L3: 3 – 1 2 – 2 1 = 0 3 = 3 1 +2 1 2. By applying this so-called method of undetermined coefficients, we get the values j, j = 2, 3, ..., from the linear homogeneous difference equation j – 1 j-1 – 2 j-2 = 0 with the initial conditions 0 = 1 and 1 = 1. The stability condition for the AR(2) process requires that, for j , the j converge to zero, i.e. that the characteristic equation of (2.12), 2 (2.17) – 1 – 2 = 0, has only roots with absolute values smaller than one, or that all solutions of the lag polynomial in (2.13),
  • 16. 42 Univariate Stationary Processes (2.18) 1– 1 L– 2 L2 = 0 are larger than one in modulus. Together with stochastic initial conditions, this guarantees the stationarity of the process. The stability conditions are fulfilled if the following parameter restrictions hold jointly for (2.17) and (2.18): 1 + (- 1) + (- 2) > 0, 1 – (- 1) + (- 2) > 0, 1 – (- 2) > 0. As a constant is not changed by the application of the lag operator, the number ‘1’ can substitute the lag operator in the corresponding terms. Thus, due to (2.16), the Wold representation of the AR(2) process is given by (2.19) xt = j ut j , 0 = 1. 1 1 2 j 0 Under the assumption of stationarity, the expected value of the stochastic process can be calculated directly from (2.12) since E[xt] = E[xt-1] = E[xt-2] = . We get = + 1 + 2 or (2.20) E[xt] = = . 1 1 2 As the stability conditions are fulfilled, 1 – 1 – 2 > 0 holds, i.e. the sign of also determines the sign of . In order to calculate the second order moments, we can assume – with- out loss of generality – that = 0, which is equivalent to = 0. Multiply- ing (2.12) with xt- , 0, and taking expectations leads to (2.21) E[xt- xt] = 1 E[xt- xt-1] + 2 E[xt- xt-2] + E[xt- ut] . Because of representation (2.19), relation (2.8) holds here as well. This leads to the following equations 2 0 : (0) 1 (1) 2 (2) (2.22) 1 : (1) 1 (0) 2 (1) , 2 : (2) 1 (1) 2 (0)
  • 17. 2.1 Autoregressive Processes 43 and, more generally, the following difference equation holds for the auto- covariances ( ), 2, (2.23) ( )– 1 ( -1) – 2 ( -2) = 0. As the stability conditions hold, the autocovariances which can be recur- sively calculated with (2.23) are converging to zero for . The relations (2.22) result in 1 2 2 (2.24) V[xt] = (0) = 2 2 (1 2 ) [(1 2 ) 1 ] for the variance of the AR(2) process, and in 1 2 (1) = 2 2 , (1 2 ) [(1 2 ) 1 ] and 2 2 1 2 2 2 (2) = 2 2 , (1 2 ) [(1 2 ) 1 ] for the autocovariances of order one and two. The autocorrelations can be calculated accordingly. If we divide (2.23) by the variance (0) we get the linear homogeneous second order differ- ence equation, (2.25) ( )– 1 ( -1) – 2 ( -2) = 0 with the initial conditions (0) = 1 and (1) = 1/(1 – 2) for the autocorre- lation function. Depending on the values of 1 and 2, AR(2) processes can generate quite different developments, and, therefore, these processes can show considerably different characteristics. Example 2.4 Let us consider the AR(2) process (E2.3) xt = 1 + 1.5 xt-1 – 0.56 xt-2 + ut with a variance of ut of 1. Because the characteristic equation 2 – 1.5 + 0.56 = 0 has the two roots 1 = 0.8 and 2 = 0.7, (E2.3) is stationary, given that we have stochastic initial conditions. The expected value of this process is
  • 18. 44 Univariate Stationary Processes 1 = = 16.6 . 1 1.5 0.56 The variance of (E2.3) can be calculated from (2.24) as (0) = 19.31. A realisation of this process (with 180 observations) is given in Figure 2.5 in which the (esti- mated) mean was subtracted. Thus, the realisations fluctuate around zero, and the process always tends to go back to the mean. This mean-reverting behaviour is a typical property of stationary processes. Due to (2.25) we get ( ) – 1.5 ( -1) + 0.56 ( -2) = 0, = 2, 3, ..., with (0) = 1, (1) = 0.96 for the autocorrelation function. The general solution of this homogeneous differ- ence equation is ( ) = C1 (0.8) + C2 (0.7) , where C1 and C2 are two arbitrary constants. Taking into account the two initial conditions we get ( ) = 2.6 (0.8) – 1.6 (0.7) for the autocorrelation coefficients. This development is also expressed in Figure 2.5. The coefficients are always positive but strictly monotonically decreasing. Initially, the estimated autocorrelogram using the given realisation is also mono- tonically decreasing, but, contrary to the theoretical development, the values begin to fluctuate from the tenth lag onwards. However, except for the coefficient for = 16, the estimates are not significantly different from zero; they are all inside the approximate 95 percent confidence interval indicated by the dotted lines. The characteristic equations of stable autoregressive processes of second or higher order can result in conjugate complex roots. In this case, the time series exhibit dampened oscillations, which are shocked again and again by the pure random process. The solution of the homogeneous part of (2.12) for conjugate complex roots can be represented by xt = dt (C1 cos (f t) + C2 sin (f t)) with C1 and C2 again being arbitrary constants that can be determined by using the initial conditions. The dampening factor d = 2 corresponds to the modulus of the two roots, and f = arccos 1 2 2
  • 19. 2.1 Autoregressive Processes 45 xt 10 5 0 t -5 -10 a) Realisation 1 0.8 0.6 0.4 0.2 0 5 10 15 20 -0.2 -0.4 b) Theoretical autocorrelation function ˆ 1 0.8 0.6 0.4 0.2 0 5 10 15 20 -0.2 -0.4 c) Estimated autocorrelation function with confidence intervals Figure 2.5: AR(2) process with 1 = 1.5, 2= -0.56
  • 20. 46 Univariate Stationary Processes xt 5 2.5 0 t -2.5 -5 a) Realisation 1 0.8 0.6 0.4 0.2 0 -0.2 5 10 15 20 -0.4 -0.6 -0.8 -1 b) Theoretical autocorrelation function ˆ 1 0.8 0.6 0.4 0.2 0 -0.2 5 10 15 20 -0.4 -0.6 -0.8 -1 c) Estimated autocorrelation function with confidence intervals Figure 2.6: AR(2) process with 1 = 1.4 and 2 = -0.85
  • 21. 2.1 Autoregressive Processes 47 is the frequency of the oscillation. The period of the cycles is P = 2 /f. Processes with conjugate complex roots are well-suited to describe busi- ness cycle fluctuations. Example 2.5 Consider the AR(2) process (E2.4) xt = 1.4 xt-1 – 0.85 xt-2 + ut, with a variance of ut of 1. The characteristic equation 2 – 1.4 + 0.85 = 0 has the two solutions 1 = 0.7 + 0.6i and 2 = 0.7- 0.6i. (‘i’ stands for the imagi- nary unit: i2 = - 1.) The modulus (dampening factor) is d = 0.922. Thus, (E2.4) with stochastic initial conditions and a mean of zero is stationary. According to (2.24) the variance is given by (0) = 8.433. A realisation of this process with 180 observations is given in Figure 2.6. Its development is cyclical around its zero mean. For the autocorrelation function we get ( ) – 1.4 ( -1) + 0.85 ( -2) = 0, = 2, 3, ..., (0) = 1, (1) = 0.76, because of (2.25). The general solution is ( ) = 0.922 (C1 cos (0.709 ) + C2 sin (0.709 )) . Taking into account the two initial conditions, we get for the autocorrelation coef- ficients ( ) = 0.922 (cos (0.709 ) + 0.1 sin (0.709 )) , with a frequency of f = 0.709. In case of quarterly data, this corresponds to a period length of about 9 quarters. Both the theoretical and the estimated autocorrelations in Figure 2.6 show this kind of dampened periodical behaviour. Example 2.6 Figure 2.7 shows the development of the three month money market rate in Frank- furt (GSR) from the first quarter of 1970 to the last quarter of 1998 as well as the autocorrelation and the partial autocorrelation functions explained in Section 2.1.4. Whereas the autocorrelation function tends only slowly towards zero, the partial autocorrelation function breaks off after two lags. As will be shown below, this indicates an AR(2) process. For the period from 1970 to 1998, estimation with OLS results in the following:
  • 22. 48 Univariate Stationary Processes Percent 16 14 12 10 8 6 4 2 0 year 1970 1975 1980 1985 1990 1995 a) Three month money market rate in Frankfurt 1970 – 1998 ˆ( ) 1 0.8 0.6 0.4 0.2 0 -0.2 5 10 15 20 -0.4 -0.6 b) Estimated autocorrelation (__) and partial -0.8 autocorrelation ( ) functions with confidence -1 intervals ˆ( ) 1 0.8 0.6 0.4 0.2 0 -0.2 5 10 15 20 -0.4 -0.6 -0.8 c) Estimated autocorrelation function of the -1 residuals of the estimated AR(2)-process with confidence intervals Figure 2.7: Three month money market rate in Frankfurt, 1970 – 1998
  • 23. 2.1 Autoregressive Processes 49 GSRt = 0.575 + 1.407 GSRt-1 – 0.498 GSRt-2 + ût,. (2.82) (17.50) (-6.16) R 2 = 0.910, SE = 0.812, Q(6) = 6.475 (p = 0.372), with t values being again given in parentheses. On the 0.1 percent level, both es- timated coefficients of the lagged interest rates are significantly different from ze- ro. The autocorrelogram of the estimated residuals (given in Figure 2.7c) as well as the Ljung-Box Q statistic which is calculated with 8 correlation coefficients (and 6 degrees of freedom) does not indicate any higher order process. The two roots of the process are 0.70 ± 0.06i, i.e. they indicate dampened cycles. The modulus (dampening factor) is d = 0.706; the frequency f = 0.079 corresponds to a period of 79.7 quarters and therefore of nearly 20 years. Correspondingly, this oscillation cannot be detected in the estimated autocorrelogram presented in Fig- ure 2.7b. 2.1.3 Higher Order Autoregressive Processes An AR(p) process can be described by the following stochastic difference equation, (2.26) xt = + 1 xt-1 + 2 xt-2 + ... + p xt-p + ut, with p 0, where ut is again a pure random process with zero mean and variance 2. Using the lag operator we can also write: (2.26') (1 – 1 L– 2 L2 – ... – p Lp) xt = + ut. If we assume stochastic initial conditions, the AR(p) process in (2.26) is stationary if the stability conditions are satisfied, i.e. if the characteristic equation p p-1 p-2 (2.27) – 1 – 2 – ... – p = 0 only has roots with absolute values smaller than one, or if the solutions of the lag polynomial (2.28) 1– 1 L– 2 L2 – ... – p Lp = 0 only have roots with absolute values larger than one. If the stability conditions are satisfied, we get the Wold representation of the AR(p) process by the series expansion of the inverse lag polynomial, 1 2 p = 1+ 1L + 2L + ... 1 1L ... pL as
  • 24. 50 Univariate Stationary Processes (2.29) xt = j ut j . 1 1 ... p j 0 Generalising the approach that was used to calculate the coefficients of the AR(2) process, the series expansion can again be calculated by the method of undetermined coefficients. From (2.29) we get the constant (unconditional) expectation as E[xt] = = . 1 1 ... p Again, similarly to the AR(1) and AR(2) cases, a necessary condition for stability is 1– 1 – 2 – ... – p > 0. Without loss of generality we can set = 0, i.e. = 0, in order to calcu- late the autocovariances. Because of ( ) = E[xt- xt], we get according to (2.26) (2.30) ( ) = E[xt- ( 1 xt-1 + 2 xt-2 + ... + p xt-p + ut)] . For = 0, 1, ... , p, it holds that 2 (0) 1 (1) 2 (2) p (p) (1) 1 (0) 2 (1) p (p 1) (2.31) (p) 1 (p 1) 2 (p 2) p (0) because of the symmetry of the autocovariances and because of E[xt- ut] = 2 for = 0 and zero for > 0. This is a linear inhomogeneous equation system for given i and 2 to derive the p + 1 unknowns (0), (1), ..., (p). For > p we get the linear homogeneous difference equation to calculate the autocovariances of order > p: (2.32) ( )– 1 ( -1) – ... – p ( -p) = 0. If we divide (2.32) by (0), we get the corresponding difference equation to calculate the autocorrelations: (2.33) ( )– 1 ( -1) – ... – p ( -p) = 0. The initial conditions (1), (2), ..., (p) can be derived from the so-called Yule-Walker equations. We get those if we successively insert = 1, 2, ..., p in (2.33), or, if the last p equations in (2.31) are divided by (0),
  • 25. 2.1 Autoregressive Processes 51 (1) = 1 + 2 (1) + 3 (2) + ... + p (p-1) (2) = 1 (1) + 2 + 3 (1) + ... + p (p-2) (2.34) (p) = 1 (p-1) + 2 (p-2) + 3 (p-3) + ... + p If we define ' = ( (1), (2), ..., (p)), ' = ( 1, 2, ..., p) and 1 (1) (2) (p 1) (1) 1 (1) (p 2) R p p (p 1) (p 2) (p 3) 1 we can write the Yule-Walker equations (2.34) in matrix form, (2.35) = R . If the first p autocorrelation coefficients are given, the coefficients of the AR(p) process can be calculated according to (2.35) as (2.36) = R-1 . Equations (2.35) and (2.36) show that there is a one-to-one mapping be- tween the p coefficients and the first p autocorrelation coefficients of an AR(p) process. If there is a generating pure random process, it is suffi- cient to know either or to identify the AR(p) process. Thus, there are two possibilities to describe the structure of an autoregressive process of order p: the parametric representation that uses the parameters 1, 2, ..., p, and the non-parametric representation with the first p autocorrelation coef- ficients (1), (2), ..., (p). Both representations contain exactly the same information. Which representation is used depends on the specific situa- tion. We usually use the parametric representation to describe finite order autoregressive processes (with known order). Example 2.7 Let the fourth order autoregressive process xt = 4 xt-4 + ut, 0 < 4 < 1, 2 be given, where ut is again white noise with zero mean and variance . Applying (2.31) we get: 2 (0) = 4 (4) + , (1) = 4 (3), (2) = 4 (2),
  • 26. 52 Univariate Stationary Processes (3) = 4 (1), (4) = 4 (0). From these relations we get 2 (0) = 2 , 1 4 (1) = (2) = (3) = 0, 2 (4) = 4 2 . 1 4 As can easily be seen, only the autocovariances with lag = 4j, j = 1, 2, ... are dif- ferent from zero, while all other autocovariances are zero. Thus, for > 0 we get the autocorrelation function j 4 for 4 j, j 1, 2, ... ( ) = . 0 elsewhere. Only every fourth autocorrelation coefficient is different from zero; the sequence of these autocorrelation coefficients decreases monotonically like a geometric se- ries. Employing such a model for quarterly data, this AR(4) process captures the correlation between random variables that are distant from each other by a multi- plicity of four periods, i.e. the structure of the correlations of all variables which belong to the i-th quarter of a year, i = 1, 2, 3, 4, follows an AR(1) process while the correlations between variables that belong to different quarters are always ze- ro. Such an AR(4) process provides a simple possibility of modelling seasonal ef- fects which typically influence the same quarters of different years. For empirical applications, it is advisable to first eliminate the deterministic component of a sea- sonal variation by employing seasonal dummies and then to model the remaining seasonal effects by such an AR(4) process. 2.1.4 The Partial Autocorrelation Function Due to the stability conditions, autocorrelation functions of stationary fi- nite order autoregressive processes are always sequences that converge to zero but do not break off. This makes it difficult to distinguish between processes of different orders when using the autocorrelation function. To cope with this problem, we introduce a new concept, the partial autocorre- lation function. The partial correlation between two random variables is the correlation that remains if the possible impact of all other random vari- ables has been eliminated. To define the partial autocorrelation coefficient, we use the new notation,
  • 27. 2.1 Autoregressive Processes 53 xt = k1xt-1 + k2xt-2 + … + kkxt-k + ut , where ki is the coefficient of the variable with lag i if the process has or- der k. (According to the former notation it holds that i = ki i = 1,2,…,k.) The coefficients kk are the partial autocorrelation coefficients (of order k), k = 1,2,… . The partial autocorrelation measures the correlation between xt and xt-k which remains when the influences of xt-1, xt-2, ..., xt-k+1 on xt and xt-k have been eliminated. Due to the Yule-Walker equations (2.35), we can derive the partial au- tocorrelation coefficients kk from the autocorrelation coefficients if we calculate the coefficients kk, which belong to xt-k, for k = 1, 2, ... from the corresponding linear equation systems 1 (1) (2) (k 1) k1 (1) (1) 1 (2) (k 2) k2 (2) , k = 1, 2, ... . (k 1) (k 2) (k 3) 1 kk (k) With Cramer’s rule we get 1 (1) (1) (1) 1 (2) (k 1) (k 2) (k) (2.37) kk , k = 1, 2, ... . 1 (1) (k 1) (1) 1 (k 2) (k 1) (k 2) 1 Thus, if the data generating process (DGP) is an AR(1) process, we get for the partial autocorrelation function: 11 = (1) 1 (1) (1) (2) (2) (1) 2 22 = = = 0, 1 (1) 1 (1) 2 (1) 1
  • 28. 54 Univariate Stationary Processes because of (2) = (1)2. Generally, the partial autocorrelation coefficients kk = 0 for k >1 in an AR(1) process. If the DGP is an AR(2) process, we get (2) (1)2 11 = (1), 22 = , kk = 0 for k > 2 . 1 (1)2 The same is true for an AR(p) process: all partial autocorrelation coeffi- cients of order higher than p are zero. Thus, for finite order autoregressive processes, the partial autocorrelation function provides the possibility of identifying the order of the process by the order of the last non-zero partial autocorrelation coefficient. We can estimate the partial autocorrelation co- efficients consistently by substituting the theoretical values in (2.37) by their consistent estimates (1.10). For the partial autocorrelation coefficients which have a theoretical value of zero, i.e. the order of which is larger than the order of the process, we get asymptotically that they are normally dis- tributed with E[ ˆ kk ] = 0 and V[ ˆ kk ] = 1/T for k > p . Example 2.8 The AR(1) process of Example 2.1 has the following theoretical partial autocorre- lation function: 11 = (1) = and zero elsewhere. In this example, takes on the values 0.9, 0.5 and -0.9. The estimates of the partial autocorrelation functions for the realisations in Figures 2.1 and 2.3 are presented in Figure 2.8. It is obvious for both processes that these are AR(1) processes. The estimated value for the process with = 0.9 is ˆ 11 = 0.91, while all other partial autocorrelation coefficients are not significantly different from zero. We get ˆ = -0.91 for the process with 11 = -0.9, while all estimated higher order partial autocorrelation coefficients do not deviate significantly from zero. The AR(2) process of Example 2.4 has the following theoretical partial auto- correlation function: 11 = 0.96, 22 = -0.56 and zero elsewhere. The realisation of this process, which is given in Figure 2.5, leads to the empirical partial autocorre- lation function in Figure 2.8. It corresponds quite closely to the theoretical func- tion; we get ˆ 11 = 0.95 and ˆ 22 = -0.60 and all higher order partial autocorrelation coefficients are not significantly different from zero. The same holds for the AR(2) process with the theoretical non-zero partial autocorrelations 11 = 0.76 and ˆ ˆ 22 = -0.85 given in Example 2.5. We get the estimates 11 = 0.76 and 22 = -0.78, whereas all higher order partial correlation coefficients are not significantly differ- ent from zero.
  • 29. 2.1 Autoregressive Processes 55 kk 1 0.8 0.6 0.4 0.2 0 k -0.2 5 10 15 20 -0.4 -0.6 -0.8 AR(1) process with = 0.9 -1 kk 1 0.8 0.6 0.4 0.2 0 k -0.2 5 10 15 20 -0.4 -0.6 -0.8 AR(1) process with = -0.9 -1 kk 1 0.8 0.6 0.4 0.2 0 k -0.2 5 10 15 20 -0.4 -0.6 -0.8 AR(2) process with 1 = 1.5, 2 = -0.56 -1 kk 1 0.8 0.6 0.4 0.2 0 k -0.2 5 10 15 20 -0.4 -0.6 -0.8 -1 AR(2) process with 1 = 1.4, 2 = -0.85 Figure 2.8: Estimated partial autocorrelation functions
  • 30. 56 Univariate Stationary Processes 2.1.5 Estimating Autoregressive Processes Under the assumption of a known order p we have different possibilities to estimate the parameters: (i) If we know the distribution of the white noise process that generates the AR(p) process, the parameters can be estimated by using maxi- mum likelihood (ML) methods. (ii) The parameters can also be estimated with the method of moments by using the Yule-Walker equations. (iii) A further possibility is to treat (2.26) xt = + 1 xt-1 + 2 xt-2 + ... + p xt-p + ut, as a regression equation and apply the ordinary least squares (OLS) method for estimation. OLS provides consistent estimates. Moreover, if (2.26) fulfils the stability conditions, T ( ˆ ) as well as T( ˆ i i ) , i = 1, 2, ..., p, are asymptotically normally distributed. If the order of the AR process is unknown, it can be estimated with the help of information criteria. For this purpose, AR processes with succes- sively increasing orders p = 1, 2, ..., pmax are estimated. Finally, the order p* is chosen which minimises the respective criterion. The following crite- ria are often used: (i) The final prediction error which goes back to HIROTUGU AKAIKE (1969) T T m 1 FPE = (u (p) ) 2 . ˆt T m T t 1 (ii) Closely related to this is the Akaike information criterion (HIROTUGU AKAIKE (1974)) T 1 2 AIC = ln (u (p) ) 2 ˆt m . T t 1 T (iii) Alternatives are the Bayesian criterion of GIDEON SCHWARZ (1978) T 1 ln T SC = ln (u (p) ) 2 ˆt m T t 1 T (iv) as well as the criterion developed by EDWARD J. HANNAN and BARRY G. QUINN (1979)
  • 31. 2.1 Autoregressive Processes 57 T 1 2 ln(ln T) HQ = ln (u (p) ) 2 ˆt m . T t 1 T u (p) are the estimated residuals of the AR(p) process, while m is the number ˆt of estimated parameters. If the constant term is estimated, too, m = p + 1 for an AR(p) process. These criteria are always based on the same princi- ple: They consist of one part, the sum of squared residuals (or its loga- rithm), which decreases when the number of estimated parameters increas- es, and of a ‘penalty term’, which increases when the number of estimated parameters increases. Whereas the first two criteria overestimate the true (finite) order asymptotically, the two other criteria estimate the true order of the process consistently. For T 16, the penalty term of SC is larger than the one of HQ which itself is larger than the one of AIC. This leads to the following ordering of the estimated AR orders: SC order HQ order AIC order. Please note that choosing such an order does not always imply that we have white noise residuals. This has to be checked independently. Many computer programmes like, for example, EViews, do not exactly report the criteria given in (ii) through (iv). Relying on the log-likelihood function instead of on the sum of squared residuals directly, they add 1 + ln(2 ) 2.8379, which does, of course, neither affect the order nor which value of p minimises the information criteria. Example 2.9 As in Example 2.6, we take a look at the development of the three month money market interest rate in Frankfurt am Main. If, for this series, we estimate AR pro- cesses up to the order p = 4, we get the following results (for T = 116): p = 0: AIC = 4.8334, HQ = 4.8430, SC = 4.8571; p = 1: AIC = 2.7180, HQ = 2.7373, SC = 2.7655; p = 2: AIC = 2.4457, HQ = 2.4746, SC = 2.5169; p = 3: AIC = 2.4609, HQ = 2.4995, SC = 2.5559; p = 4: AIC = 2.4778, HQ = 2.5260, SC = 2.5965. With all three criteria we get the minimum for p = 2. Thus, the optimal number of lags is p* = 2, as used in Example 2.6.
  • 32. 58 Univariate Stationary Processes 2.2 Moving Average Processes Moving average processes of an infinite order have already occurred when we presented the Wold decomposition theorem. They are, above all, of theoretical importance as, in practice, only a finite number of (different) parameters can be estimated. In the following, we consider finite order moving average processes. We start with the first order moving average process and then discuss general properties of finite order moving average processes. 2.2.1 First Order Moving Average Processes The first order moving average process (MA(1)) is given by the following equation: (2.38) xt = + ut – ut-1 , or (2.38') xt – = (l – L)ut , with ut again being a pure random process. The Wold representation of an MA(1) process (as of any finite order MA process) has a finite number of terms. In this special case, the Wold coefficients are 0 = 1, 1 = - and j 2 = 0 for j 2. Thus, j is finite for all finite values of , i.e. an MA(1) j process is always stationary. Taking expectations of (2.38) leads to E[xt] = + E[ut] – E[ut-1] = . The variance can also be calculated directly, V[xt] = E[(xt – )2] = E[(ut – ut-1)2] 2 = E[( u 2 – 2 ut ut-1 + t u 2 1 )] t 2 2 = (1 + ) = (0) . Therefore, the variance is constant at any point of time. For the covariances of the process we get E[(xt – )(xt+ – )] = E[(ut – ut-1)(ut+ – ut+ -1)] 2 = E[(utut+ – utut+ –1 – ut-1ut+ + ut-1ut+ -1)] .
  • 33. 2.2 Moving Average Processes 59 The covariances are different from zero only for = ± 1, i.e. for adjoining random variables. In this case 2 (1) = - . Thus, for an MA(1) process, all autocovariances and therefore all autocorre- lations with an order higher than one disappear, i.e. ( ) = ( ) = 0 for 2. The correlogram of an MA(1) process is (0) = 1, (1) = 2 , ( ) = 0 for 2. 1 If we consider (1) as a function of , (1) = f( ), it holds that f(0) = 0 and f( ) = -f(- ), i.e. that f( ) is point symmetric to the origin, and that |f( )| 0.5. f( ) has its maximum at = -1 and its minimum at = 1. Thus, an MA(1) process cannot have a first order autocorrelation above 0.5 or be- low -0.5. If we know the autocorrelation coefficient (1) = 1, for example, by es- timation, we can derive (estimate) the corresponding parameter by using the equation for the first order autocorrelation coefficient, 2 (1 + ) 1 + = 0. The quadratic equation can also be written as 2 1 (2.39) + + 1 = 0, 1 and it has the two solutions 1 2 1,2 = 1 1 4 1 . 2 1 Thus, the parameters of the MA(1) process can be estimated non-linearly with the method of moments: the theoretical moments are substituted by their consistent estimates and the resulting equation is used for estimating the parameters consistently. Because of | 1| 0.5, the quadratic equation always results in real roots. They also have the property that 1 2 = 1. This gives us the possibility to model the same autocorrelation structure with two different parameters, where one is the inverse of the other. In order to get a unique parameterisation, we require a further property of the MA(1) process. We ask under which conditions the MA(1) process (2.38) can have an autoregressive representation. By using the lag operator representation (2.38') we get
  • 34. 60 Univariate Stationary Processes 1 ut = – + xt . 1 1 L An expansion of the series 1/(1 – L) is only possible for < 1 and re- sults in the following AR( ) process 2 ut = – + xt + xt-1 + xt-2 + ... 1 or 2 xt + xt-1 + xt-2 + ... = + ut . 1 This representation requires the condition of invertibility ( < 1). In this case, we get a unique parameterisation of the MA(1) process. Applying the lag polynomial in (2.38'), we can formulate the invertibility condition in the following way: An MA(1) process is invertible if and only if the root of the lag polynomial 1– L = 0 is larger than one in modulus. Example 2.10 The following MA(1) process is given: (E2.5) xt = t – t-1, t ~ N(0, 22), with = -0.5. For this process we get E[xt] = 0, V[xt] = (1 + 0.52)·4 = 5, 0.5 (1) = = 0.4, 1 0.52 ( ) = 0 for 2. Solving the corresponding quadratic equation (2.39) for this value of (1) leads to the two roots 1 = -2.0 and 2 = -0.5. If we now consider the process (E2.5a) yt = t + 2 t-1, t ~ N(0, 1), we obtain the following results: E[yt] = 0, V[yt] = (1 + 2.02)·1 = 5,
  • 35. 2.2 Moving Average Processes 61 2.0 (1) = = 0.4, 1 2.02 ( ) = 0 for 2, i.e. the variances and the autocorrelogram of the two processes (E2.5) and (E2.5a) are identical. The only difference between them is that (E2.5) is invertible, be- cause the invertibility condition < 1 holds, whereas (E2.5a) is not invertible. Thus, given the structure of the correlations, we can choose the one of the two processes that fulfils the invertibility condition without imposing any restrictions on the structure of the process. With equation (2.37), the partial autocorrelation function of the MA(1) process can be calculated in the following way: 11 = (1), 1 (1) (1) 0 (1) 2 22 = = < 0, 1 (1) 1 (1) 2 (1) 1 1 (1) (1) (1) 1 0 0 (1) 0 (1)3 33 = = 0 for 0, 1 (1) 0 1 2 (1) 2 (1) 1 (1) 0 (1) 1 1 (1) 0 (1) (1) 1 (1) 0 0 (1) 1 0 0 0 (1) 0 (1) 4 44 = = < 0, 1 (1) 0 0 (1 (1) 2 ) 2 (1) 2 (1) 1 (1) 0 0 (1) 1 (1) 0 0 (1) 1 etc.
  • 36. 62 Univariate Stationary Processes If is positive, (1) is negative and vice versa. This leads to the two possible patterns of partial autocorrelation functions, exemplified by = ±0.8: = 0.8, ii {-0.49,-0.31,-0.22, -0.17, ... } , = -0.8, ii {0.49,-0.31, 0.22, -0.17, ... } . Thus, contrary to the AR(1) process, the autocorrelation function of the MA(1) process breaks off, while the partial autocorrelation function does not. These properties hold generally, since invertible finite order MA pro- cesses are equivalent to infinite order AR processes. 2.2.2 MA(1) and Temporal Aggregation The time series which are discussed in this book are measured in discrete time, with intervals of equal length. Exchange rates, for example, are nor- mally quoted at the end of each trading day. For econometric analyses, however, monthly, quarterly, or even annual data are used, rather than the- se daily values. Usually, averages or end-of-period data are used for tem- poral aggregation. Thus, two aggregation schemes have to be distinguished. The first one is skip sampling (or: systematic sampling) where only every mth data point is recorded. If xt is the basic series at t = 1, 2, 3,…, the skip sampled series ys with new time scale s is end-of-period data, y1 = xm, y2 = x2m, y3 = x3m, …, ys = xsm. Such an aggregation is typical for stock variables. However, the second scheme of averaging over m non-overlapping periods is also widely used, in particular for rates or indices: 1 y1 xm xm 1 ... x1 m 1 y2 x 2m x 2m 1 ... x m 1 m 1 ys x sm x sm 1 ... x (s 1)m 1 . m
  • 37. 2.2 Moving Average Processes 63 In the following, we do not present a general theory of temporal aggrega- tion but just discuss a special case of particular applied interest, the ran- dom walk, with xt = xt-1 + ut, where an artificial MA(1) structure arises due to aggregation by averaging. It is straightforward to see that systematic sampling does not affect the random walk property, since in this case we can write sm ys = x0 + ut . t 1 From this representation we get ys = ys-1 + s, with s being white noise: s = usm + usm-1 + ... + u(s-1)m+1, with E[ s] = 0 and 2 m u for 0 E( s · s– ) = . 0 elsewhere Hence, the random walk property is inherited by ys, only the variance of the differences ys – ys-1 is inflated in the obvious way. In case of averaging, ys , matters get more complicated. It can, however, be shown that the dif- ferences ys ys 1 s follow no longer a white noise process but an MA(1) scheme hidden be- hind 1 s u sm 2u sm 1 ... mu s 1m 1 ... 2u s 2 m 3 us 2 m 2 . m We omit details but refer to HOLBROOK WORKING (1960) who showed that with increasing aggregation level, m , one obtains the autocorre- lation function
  • 38. 64 Univariate Stationary Processes 1, 0 E s s 1 ( ) = , 1 . V s 4 0, elsewhere Note that the above autocorrelation function corresponds to the following MA(1)-process s us us 1 where u s is white noise, and the limiting value (for m ) of the MA pa- rameter is 3 2 0.268. GEORGE C. TIAO (1972) generalised this result the following way: If xt – xt-1 is not generated by white noise but by an invertible MA(1) pro- cess, then ys ys 1 behaves with growing m like the MA(1) process us u s 1 , where is independent of the underlying MA(1) structure of xt – xt-1. This result even continues to hold when the assumption that xt – xt-1 is MA(1) is replaced by a more general moving average process of higher order as introduced in subsection 2.2.3. Example 2.11 Consider averaging over m = 2 periods, 1 ys x 2s x 2s 1 . 2 For the random walk xt = xt-1 + ut, it holds that s ys ys 1 1 = (x2s + x2s-1 – x2s-2 – x2s-3) 2 1 = ( u2s + 2 u2s-1 + u2s-2) . 2 This process can be described as s us us 1 with = 2 2 – 3 –0.172, and
  • 39. 2.2 Moving Average Processes 65 3 2 u for 0 2 1 2 E( s · s ) = u for 1, 4 0 elsewhere such that for m = 2 the autocorrelation coefficient at lag one becomes (1) = 1/6. Example 2.12 Example 1.3 as well as Figure 1.8 present the end-of-month exchange rate be- tween the Swiss Franc and the U.S. Dollar over the period from January 1974 to December 2011. The autocorrelogram of the first differences of the logarithms of this time series indicates that they follow a pure random process. The tests we ap- plied did not reject this null hypothesis. If we use monthly averages instead of end-of-month data, the following MA(1) process can be estimated for the first difference of the logarithms of this exchange rate: ln(et) = -0.003 + ût + 0.308 ût-1, (-1.53) (6.91) R2 = 0.082, SE = 0.028, Q(11) = 8.216 (p = 0.694), JB = 21.194 (p = 0.000), with the t values again given in parentheses. ln(·) denotes the natural logarithm. The estimated coefficient of the MA(1) term is highly significantly different from zero. The Ljung-Box Q-statistic indicates that there is no longer any significant autocorrelation in the residuals. As m 20 is relatively large (in this context), the estimated values of the MA(1) term should not be too different from the theoreti- cal value given by GEORGE C. TIAO (1972). The theoretical value -0.268 lies in the two-sigma confidence interval of the estimated parameter -0.308. 2.2.3 Higher Order Moving Average Processes In general, the moving average process of order q (MA(q)) can be written as (2.40) xt = + ut – 1 ut-1 – 2 ut-2 – ... – q ut-q with q 0 and ut as a pure random process. Using the lag operator we get 2 q (2.40') xt – = (1 – 1L – 2L – ... – qL )ut = (L)ut .
  • 40. 66 Univariate Stationary Processes From (2.40) we see that we already have a finite order Wold representation with k = 0 for k > q. Thus, there are no problems of convergence, and every finite MA(q) process is stationary, no matter what values are used for j, j = 1, 2, ..., q. For the expectation of (2.40) we immediately get E[xt] = . Thus, the variance can be calculated as: V[xt] = E[(xt – )2] = E[(ut – 1 ut-1 – ... – q ut-q)2] 2 = E[( u 2 + t 2 1 u 2 1 + ... + t q u2 q – 2 t 1 utut-1 – ... –2 q-1 q ut-q+1ut-q)] . From this we obtain 2 2 2 2 V[xt] = (1 + 1 + 2 + ... + q ) . For the covariances of order we can write Cov[xt, xt+ ] = E[(xt – )(xt+ – )] = E[(ut – 1ut-1 – ... – q ut-q) (ut+ – 1 ut+ -1 – ... – q ut+ -q)] = E[ut(ut+ – 1 ut+ -1 – ... – q ut+ -q) – 1 ut-1(ut+ – 1 ut+ -1 – ... – q ut+ -q) – q ut-q(ut+ – 1 ut+ -1 – ... – q ut+ -q)] . Thus, for = 1, 2, ..., q we get 2 = 1: (1) = (– 1 + 1 2 + ... + q-1 q) , 2 = 2: (2) = (– 2 + 1 3 + ... + q-2 q) , (2.41) 2 = q: (q) = – q , while we have ( ) = 0 for > q. Consequently, all autocovariances and autocorrelations with orders higher than the order of the process are zero. It is – at least theoretically – possible to identify the order of an MA(q) process by using the autocorre- logram. It can be seen from (2.41) that there exists a system of non-linear equa- tions for given (or estimated) second order moments that determines (makes it possible to estimate) the parameters 1, ..., q. As we have al-
  • 41. 2.2 Moving Average Processes 67 ready seen in the case of the MA(1) process, such non-linear equation sys- tems have multiple solutions, i.e. there exist different values for 1, 2, ... and q that all lead to the same autocorrelation structure. To get a unique parameterisation, the invertibility condition is again required, i.e. it must be possible to represent the MA(q) process as a stationary AR( ) process. Starting from (2.40'), this implies that the inverse operator -1(L) can be represented as an infinite series in the lag operator, where the sum of the coefficients has to be bounded. Thus, the representation we get is an AR( ) process -1 ut = – + (L) xt (1) = – + c jx t j , (1) j 0 where q 1 = (1 – 1L – ... – qL )( 1 + c1L + c2L2 + ... ), and the parameters ci, i = 1, 2, ... are calculated by using again the method of undetermined coefficients. Such a representation exists if all roots of q 1– 1L – ... – qL = 0 are larger than one in absolute value. Example 2.13 Let the following MA(2) process xt = ut + 0.6 ut-1 – 0.1 ut-2 be given, with a variance of 1 given for the pure random process u. For the vari- ance of x we get V[xt] = (1 + 0.36 + 0.01) 1 = 1.37 . Corresponding to (2.41) the covariances are (1) = + 0.6 – 0.06 = 0.54 (2) = – 0.1 . ( ) = 0 for > 2 This leads to the autocorrelation coefficients (1) = 0.39 and (2) = -0.07. To check whether the process is invertible, the quadratic equation 1 + 0.6 L 0.1 L2 = 0
  • 42. 68 Univariate Stationary Processes has to be solved. As the two roots -1.36 and 7.36 are larger than 1 in absolute val- ue, the invertibility condition is fulfilled, i.e. the MA(2) process can be written as an AR( ) process xt = (1 + 0.6 L – 0.1 L2) ut , 1 ut = xt 1 0.6L 0.1L2 = (1 + c1 L + c2 L2 + c3 L3 + ) xt . The unknowns ci, i = 1, 2, ..., can be determined by comparing the coefficients of the polynomials in the following way: 1 = (1 + 0.6 L – 0.1 L2)(1 + c1 L + c2 L2 + c3 L3 + ) 2 3 1 = 1 + c1 L + c2 L + c3 L + + 0.6 L + 0.6 c1 L + 0.6 c2 L3 + 2 0.1 L2 0.1 c1 L3 It holds that c1 + 0.6 = 0 c1 = 0.60, c2 + 0.6 c1 – 0.1 = 0 c2 = 0.46, c3 + 0.6 c2 – 0.1 c1 = 0 c3 = 0.34, c4 + 0.6 c3 – 0.1 c2 = 0 c4 = 0.25, . Thus, we get the following AR( ) representation xt – 0.6 xt-1 + 0.46 xt-2 – 0.34 xt-3 + 0.25 xt-4 = ut . Similarly to the MA(1) process, the partial autocorrelation function of the MA(q) process does not break off. As long as the order q is finite, the MA(q) process is stationary whatever its parameters are. If the order tends towards infinity, howev- er, for the process to be stationary the series of the coefficients has to converge just like in the Wold representation. 2.3 Mixed Processes If we take a look at the two different functions that can be used to identify autoregressive and moving average processes, we see from Table 2.1 that the situation in which neither of them breaks off can only arise if there is an MA( ) process that can be inverted to an AR( ) process, i.e. if the Wold representation of an AR( ) process corresponds to an MA( ) pro- cess. However, as pure AR or MA representations, these processes cannot
  • 43. 2.3 Mixed Processes 69 be used for empirical modelling because they can only be characterised by means of infinitely many parameters. After all, according to the principle of parsimony, the number of estimated parameters should be as small as possible when applying time series methods. In the following, we introduce processes which contain both an auto- regressive (AR) term of finite order p and a moving average (MA) term of finite order q. Hence, these mixed processes are denoted as ARMA(p,q) processes. They enable us to describe processes in which neither the auto- correlation nor the partial autocorrelation function breaks off after a finite number of lags. Again, we start with the simplest case, the ARMA(1,1) process, and consider the general case afterwards. Table 2.1: Characteristics of the Autocorrelation and the Partial Autocorrelation Functions of AR and MA Processes Partial Autocorrelation Autocorrelation Function Function MA(q) breaks off with q does not break off AR(p) does not break off breaks off with p 2.3.1 ARMA(1,1) Processes An ARMA(1,1) process can be written as follows, (2.42) xt = + xt-1 + ut – ut-1 , or, by using the lag operator (2.42') (1 – L) xt = + (1 – L) ut , where ut is a pure random process. To get the Wold representation of an ARMA(1,1) process, we solve (2.42') for xt, 1 L xt = + ut . 1 1 L It is obvious that must hold, because otherwise xt would be a pure random process fluctuating around the mean = /(1 – ). The j, j = 0, 1, 2, ..., can be determined as follows:
  • 44. 70 Univariate Stationary Processes 1 L 2 3 = 0 + 1L + 2L + 3L + … 1 L 2 3 1 – L = (1 – L)( 0 + 1L + 2L + 3L + …) 2 3 1– L = 0 + 1L + 2L + 3L + … 2 3 – 0 L – 1L – 2L – … . Comparing the coefficients of the two lag polynomials we get L0: 0 = 1 L1: 1 – 0 = – 1 = – L2: 2 – 1 = 0 2 = ( – ) L3: 3 – 2 = 0 3 = 2 ( – ) Lj: j – j-1 = 0 j = j-1 ( – ). The j, j 2 can be determined from the linear homogeneous difference equation j – j-1 =0 with 1 = – as initial condition. The j converge towards zero if and only if | | < 1. This corresponds to the stability condition of the AR term. Thus, the ARMA(1,1) process is stationary if, with stochastic initial condi- tions, it has a stable AR(1) term. The Wold representation is 2 (2.43) xt = + ut + ( – ) ut-1 + ( – ) ut-2 + ( – ) ut-3 + ... . 1 Thus, the ARMA(1,1) process can be written as an MA( ) process. To invert the MA(1) part, | | < 1 must hold. Starting from (2.42') leads to 1 L ut = + xt . 1 1 L If 1/(1 – L) is developed into a geometric series we get 2 2 ut = + (1 – L)(1 + L + L + ... ) xt 1 2 = + xt + ( – ) xt-1 + ( – ) xt-2 + ( – ) xt-3 + ... . 1
  • 45. 2.3 Mixed Processes 71 This proves to be an AR( ) representation. It shows that the combination of an AR(1) and an MA(1) term leads to a process with both MA( ) and AR( ) representation if the AR term is stable and the MA term invertible. We obtain the first and second order moments of the stationary process in (2.42) as follows: E[xt] = E[ + xt-1 + ut – ut-1] = + E[xt-1] . Due to E[xt] = E[xt-1] = , we get = , 1 i.e. the expectation is the same as in an AR(1) process. If we set = 0 without loss of generality, the expectation is zero. The autocovariance of order 0 can then be written as (2.44) E[xt- xt] = E[xt- ( xt-1 + ut – ut-1)], which leads to (0) = (1) + E[xtut] – E[xtut-1] 2 2 for = 0. Due to (2.43), E[xtut] = and E[xtut-1] = ( – ) . Thus, we can write 2 (2.45) (0) = (1) + (1 – ( – )) . (2.44) leads to (1) = (0) + E[xt-1ut] – E[xt-1ut-1] for = 1. Because of (2.43) this can be written as 2 (2.46) (1) = (0) – . If we insert (2.46) in (2.45) and solve for (0), the resulting variance of the ARMA(1,1) process is 2 1 2 2 (2.47) (0) = 2 . 1 Inserting this into (2.46), we get ( )(1 ) 2 (2.48) (1) = 2 1
  • 46. 72 Univariate Stationary Processes for the first order autocovariance. For 2, (2.44) results in the autoco- variances (2.49) ( ) = ( -1) and the autocorrelations (2.50) ( ) = ( -1) . This results in the same difference equation as in an AR(1) process but, however, with the different initial condition ( )(1 ) (1) = 2 . 1 2 The first order autocorrelation coefficient is influenced by the MA term, while the higher order autocorrelation coefficients develop in the same way as in an AR(1) process. If the process is stable and invertible, i.e. for | | < 1 and | | < 1, the sign of (1) is determined by the sign of ( – ) because of (1 + 2 – 2 ) > 0 and (1 – ) > 0. Moreover, it follows from (2.49) that the autocorrelation function – as in the AR(1) process – is monotonic for > 0 and oscillating for < 0. Due to | | < 1 with increasing, the autocorrelation function also decreases in absolute value. Thus, the following typical autocorrelation structures are possible: (i) > 0 and > : The autocorrelation function is always positive. (ii) < 0 and < : The autocorrelation function oscillates; the initial condition (1) is negative. (iii) > 0 and < : The autocorrelation function is negative from (1) onwards. (iv) < 0 and > : The autocorrelation function oscillates; the initial condition (1) is positive. Figure 2.9 shows the development of the corresponding autocorrelation functions up to = 20 for the parameter values , {0.8, 0.5, -0.5, -0.8} in which, of course, must always hold, as otherwise the ARMA(1,1) process degenerates to a pure random process. For the partial autocorrelation function we get ( )(1 ) 11 = (1) = 2 , 1 2
  • 47. 2.3 Mixed Processes 73 1 0.8 0.6 0.4 0.2 0 -0.2 5 10 15 20 -0.4 -0.6 -0.8 -1 1 0.8 0.6 0.4 0.2 0 -0.2 5 10 15 20 -0.4 -0.6 -0.8 -1 1 0.8 0.6 0.4 0.2 0 -0.2 5 10 15 20 -0.4 -0.6 -0.8 -1 1 0.8 0.6 0.4 0.2 0 -0.2 5 10 15 20 -0.4 -0.6 -0.8 -1 Figure 2.9: Theoretical autocorrelation functions of ARMA(1,1) processes
  • 48. 74 Univariate Stationary Processes 1 (1) (1) (2) (2) (1)2 (1)( (1)) 22 = = = 2 , 1 (1) 1 (1)2 1 (1) (1) 1 because of (2) = (1), 1 (1) (1) 1 (1) (1) (1) 1 (2) (1) 1 (1) 2 (2) (1) (3) (1) (1) (1) 33 = = 1 (1) (2) 1 2 (1)3 (1) 2 (2 2 ) (1) 1 (1) (2) (1) 1 (1)( (1)) 2 = , etc. 1 2 (1)3 (1) 2 (2 2 ) Thus, the ARMA(1,1) process is a stationary stochastic process where nei- ther the autocorrelation nor the partial autocorrelation function breaks off. The following example shows how, due to measurement error, an AR(1)-process becomes an ARMA(1,1) process. Example 2.14 The ‘true’ variable x t is generated by a stationary AR(1) process, (E2.8) xt = xt 1 + ut , but it can only be measured with an error vt, i.e. for the observed variable xt it holds that (E2.9) xt = x t + vt , where vt is a pure random process uncorrelated with the random process ut. (The same model was used in Example 2.3 but with a different interpretation.) If we transform (E2.8) to ut xt = 1 L and insert it into (E2.9) we get (1 – L) xt = ut + vt – vt-1 .
  • 49. 2.3 Mixed Processes 75 For the combined error term t = ut + vt – vt-1 we get 2 2 2 (0) = u + (1 + ) v 2 (1) = - v ( ) = 0 for 2, or 2 v (1) = 2 2 2 , ( ) = 0 for 2. u (1 ) v Thus, the observable variable xt follows an ARMA(1,1) process, (1 – L) xt = (1 – L) t , where can be calculated by means of (1) and t is a pure random pro- cess. (See also the corresponding results in Section 2.2.1.) 2.3.2 ARMA(p,q) Processes The general autoregressive moving average process with AR order p and MA order q can be written as (2.51) xt = + 1 xt-1 + ... + p xt-p + ut – 1 ut-1 – ... – q ut-q , with ut being a pure random process and p 0 and q 0 having to hold. Using the lag operator, we can write p q (2.51') (1 – 1L – ... – pL ) xt = + (1 – 1L – ... – qL ) ut , or (2.51'') (L) xt = + (L) ut . As factors that are common in both polynomials can be reduced, (L) and (L) cannot have identical roots. The process is stationary if – with sto- chastic initial conditions – the stability conditions of the AR term are ful- filled, i.e. if (L) only has roots that are larger than 1 in absolute value. Then we can derive the Wold representation for which (L) = (L)(1 + 1L + 2 L2 + ... ) must hold. Again, the j, j = 1, 2, ..., can be calculated by comparing the coefficients. If, likewise, all roots of (L) are larger than 1 in absolute val- ue, the ARMA(p,q) process is also invertible. A stationary and invertible ARMA(p,q) process may either be repre- sented as an AR( ) or as an MA( ) process. Thus, neither its autocorrela-
  • 50. 76 Univariate Stationary Processes tion nor its partial autocorrelation function breaks off. In short, it is possi- ble to generate stationary stochastic processes with infinite AR and MA orders by using only a finite number of parameters. Under the assumption of stationarity, (2.51) directly results in the con- stant mean E[xt] = = . 1 1 p If, without loss of generality, we set = 0 and thus also = 0, we get the following relation for the autocovariances: ( ) = E[xt- xt] = E[xt- ( 1 xt-1 + ... + p xt-p + ut – 1 ut-1 – ... – q ut-q)] . This relation can also be written as ( ) = 1 ( -1) + 2 ( -2) + ... + p ( -p) + E[xt- ut] – 1 E[xt- ut-1] – ... – q E[xt- ut-q] . Due to the Wold representation, the covariances between xt- and ut-i, i = 0, ..., q, are zero for > q, i.e. the autocovariances for > q and > p are gen- erated by the difference equation of an AR(p) process, ( ) – 1 ( -1) – 2 ( -2) – ... – p ( -p) = 0 for > q >p whereas the first q autocovariances are also influenced by the MA part. Normalisation with (0) leads to exactly the same results for the autocorre- lations. If the orders p and q are given and the distribution of the white noise process ut is known, the parameters of an ARMA(p,q) process can be esti- mated consistently by using maximum likelihood methods. These esti- mates are also asymptotically efficient. If there is no such programme available, it is possible to estimate the parameters consistently with least squares. As every invertible ARMA(p,q) process is equivalent to an AR( ) process, first of all an AR(k) process is estimated with k sufficient- ly larger than p. From this, one can get estimates of the non-observable re- siduals ût. By employing these residuals, the ARMA(p,q) process can be estimated with the least squares method, xt = + 1 xt-1 + ... + p xt-p – 1 ût-1 – ... – q ût-q + vt . This approach can also be used if p and q are unknown. These orders can, for example, be determined by using the information criteria shown in Sec- tion 2.1.5.
  • 51. 2.3 Mixed Processes 77 Percent 8 7 6 5 4 3 2 1 0 year 1994 1996 1998 2000 2002 a) New York three month money market rate, 1994 – 2003 1 0.8 0.6 0.4 0.2 0 -0.2 5 10 15 20 -0.4 -0.6 -0.8 b) Autocorrelation (__) and partial ( ) -1 autocorrelation functions of the first differences with confidence intervals ˆ 1 0.8 0.6 0.4 0.2 0 -0.2 5 10 15 20 -0.4 -0.6 -0.8 c) Autocorrelation function of the residuals -1 of the estimated ARMA(1,1)-process with confidence intervals Figure 2.10: Three month money market rate in New York, 1994 – 2003
  • 52. 78 Univariate Stationary Processes Example 2.15 Figure 2.10 shows the development of the US three month money market rate (USR) as well as the estimated autocorrelation and partial autocorrelation function of the first differences of this time series for the period from March 1994 to Au- gust 2003 (114 observations). Both functions do not show a clear break-off behav- iour. Therefore, the following ARMA(1,1) model has been estimated for this time series: USRt = – 0.006 + 0.831 USRt-1 + ût – 0.457 ût-1,. (-0.73) (10.91) (-3.57) R 2 = 0.351, SE = 0.166, Q(10) = 7.897 (p = 0.639). The AR(1) as well as the MA(1) terms are different from zero and from one at any usual significance level. The autocorrelogram of the estimated residuals, which is also given in Figure 2.10, as well as the Ljung-Box Q statistic, which is calculated for this model with 12 autocorrelation coefficients (i.e. with 10 degrees of free- dom), do not provide any evidence of a higher order process. 2.4 Forecasting As mentioned in the introduction, in the 1970’s, one of the reasons for the broad acceptance of time series analysis using the Box-Jenkins approach was the fact that forecasts with this comparably simple method often out- performed forecasts generated by large econometric models. In the follow- ing, we show how ARMA models can be used for making forecasts about the future development of time series. In doing so, we assume that all ob- servations of the time series up to time t are known. 2.4.1 Forecasts with Minimal Mean Squared Errors We want to solve the problem of making a -step ahead forecast for xt with a linear prediction function, given a stationary and/or invertible data gen- erating process. ˆ ˆ Let x t ( ) be such a prediction function for xt+ . Thus, x t ( ) is a random variable for given t and . As all stationary ARMA processes have a Wold representation, we assume the existence of such a representation without loss of generality. Thus, 2 xt = + j ut j , 0 = 1, j < , j 0 j 0
  • 53. 2.4 Forecasting 79 where ut is a pure random process with the usual properties E[ut] = 0, 2 for t s E[utus] = . 0 for t s Therefore, it also holds that (2.52) xt+ = + j ut j , = 1, 2, ... . j 0 For a linear prediction function with the information given up to time t, we assume the following representation (2.53) ˆ xt ( ) = + k ut k , = 1, 2, ... , k 0 where the k , k = 0, 1, 2, ..., = 1, 2, ..., are unknown. The forecast error ˆ of a -step forecast is ft( ) = xt+ – x t ( ), = 1, 2, ..., . In order to make a good forecast, these errors should be small. The expected quadratic fore- cast error E[(xt+ – x t ( ))2], which should be minimised, is used as the cri- ˆ terion to determine the unknowns k . Taking into account (2.52) and (2.53) we can write 2 2 E [ f ( )] = E t j ut j k ut k j 0 k 0 2 = E ut 1u t 1 1u t 1 ( k k )u t k . k 0 From this it follows that 2 2 2 2 2 2 (2.54) E [ f ( )] = t 1 1 1 k k . k 0 The variance of the forecast error reaches its minimum if we set k = +k for k = 0, 1, 2, ..., . Thus, we get the optimal linear prediction function for a -step ahead forecast from (2.53), as (2.55) ˆ xt ( ) = + k ut k , = 1, 2, ... . k 0
  • 54. 80 Univariate Stationary Processes For the conditional expectation of ut+s, given ut, ut-1, …, it holds that ut s for s 0 E[ut+s|ut, ut-1, ...] = . 0 for s 0 Thus, we get the conditional expectation of xt+ , because of (2.52), as E[xt+ |ut, ut-1, ...] = + k ut k . k 0 Due to (2.55), the conditional expectation of xt+ , with all information available at time t given, is identical to the optimal prediction function. This leads to the following result: The conditional expectation of xt+ , with all information up to time t given, provides the -step forecast with mini- mal mean squared prediction error. With (2.52) and (2.55) the -step forecast error can be written as ˆ (2.56) ft( ) = xt+ – x t ( ) = ut+ + 1ut+ -1 + 2ut+ -2 + ... + -1ut+1 with E[ft( )|ut, ut-1, ...] = E[ft( )] = 0 . From these results we can immediately draw some conclusions: 1. Best linear unbiased predictions (BLUP) of stationary ARMA process- es are given by the conditional expectation for xt+ , = 1,2, … ˆ x t ( ) = E[xt+ |xt, xt-1, ...] = Et[xt+ ] . 2. For the one-step forecast errors ( = 1), ft(1) = ut+1, we get E[ft(1)] = E[ut+1] = 0, and 2 for t s E[ft(1)fs(1)] = E[ut+1us+1] = . 0 for t s The one-step forecast errors are a pure random process; they are identi- cal with the residuals of the data generating process. If the one-step prediction errors were correlated, the prediction could be improved by using the information contained in the prediction errors. In such a case, ˆ however, x t (1) would not be an optimal forecast. 3. For the -step forecast errors ( > 1) we get ft( ) = ut+ + 1ut+ -1 + 2ut+ -2 + ... + -1ut+1 ,
  • 55. 2.4 Forecasting 81 i.e. they follow a MA( -1) process with E[ft( )] = 0 and the variance 2 2 2 (2.57) V[ft( )] = 1 1 1 . This variance can be used for constructing confidence intervals for - step forecasts. However, these intervals are too narrow for practical ap- plications because they do not take into account the uncertainty in the estimation of the parameters i, i = 1, 2, ..., -1. 4. It follows from (2.57) that the forecast error variance increases mono- tonically with increasing forecast horizon : V[ft( )] V[ft( -1)] . 5. Due to (2.57) we get for the limit 2 2 2 2 2 lim V[ft( )] = lim 1 1 1 = j = V[xt] , j 0 i.e. the variance of the -step forecast error is not larger than the vari- ance of the underlying process. 6. The following variance decomposition follows from (2.55) and (2.56): (2.58) ˆ V[xt+ ] = V[ x t ( )] + V[ft( )] . 7. Furthermore, ˆ lim x t ( ) = lim k ut k = = E[xt] , k 0 i.e. for increasing forecast horizons, the forecasts converge to the (un- conditional) mean of the series. The concept of ‘weak’ rational expectations whose information set is re- stricted to the current and past values of a variable exactly corresponds to the optimal prediction approach used here. 2.4.2 Forecasts of ARMA(p,q) Processes The Wold decomposition employed in the previous section has advantages when it comes to the derivation of theoretical results, but it is not practical- ly useful for forecasting. Thus, in the following, we will discuss forecasts directly using AR, MA, or ARMA representations.
  • 56. 82 Univariate Stationary Processes Forecasts with a Stationary AR(1) Process For this process, it holds that xt = + xt-1 + ut , with | | < 1. The optimal -step forecast is the conditional mean of xt+ , i.e. Et[xt+ ] = Et[ + xt+ -1 + ut+ ] = + Et[xt+ -1] . Due to the first conclusion, we get the following first order difference equation for the prediction function ˆ xt ( ) = + ˆ x t ( -1) , which can be solved recursively: ˆ = 1: x t (1) = + ˆ x t (0) = + xt 2 ˆ = 2: x t (2) = + ˆ x t (1) = + + xt -1 ˆ xt ( ) = (1 + + ... + ) + xt 1 ˆ xt ( ) = + xt = + (xt – ). 1 1 1 As = /(1 – ) is the mean of a stationary AR(1) process, ˆ xt ( ) = + ˆ (xt – ) with lim x t ( ) = , i.e., with increasing forecast horizon , the predicted values of an AR(1) process converge geometrically to the unconditional mean of the pro- cess. The convergence is monotonic if is positive, and oscillating if is negative. To calculate the -step prediction error, the Wold representation, i.e. the MA( ) representation of the AR(1) process, can be used, 2 3 xt = + ut + ut-1 + ut-2 + ut-3 + ... . Due to (2.56) and (2.57) we get the MA( -1) process 2 -1 ft( ) = ut+ + ut+ -1 + ut+ -2 + ... + ut+1 for the forecast error with the variance 2 2 2( 1) 2 1 2 V[ft( )] = 1 = 2 . 1
  • 57. 2.4 Forecasting 83 With increasing forecast horizons, it follows that 2 lim V[ft( )] = 2 = V[xt] , 1 i.e. the prediction error variance converges to the variance of the AR(1) process. Forecasts with Stationary AR(p) Processes Starting with the representation xt = + 1 xt-1 + 2 xt-2 + ... + p xt-p + ut , the conditional mean of xt+ is given by Et[xt+ ] = + 1 Et[xt+ -1] + ... + p Et[xt+ -p] . Here, ˆ x t (s) for s 0 Et[xt+s] = . x t s for s 0 Thus, the above difference equation can be solved recursively: ˆ = 1: x t (1) = + 1 xt + 2 xt-1 + ...+ p xt+1-p ˆ = 2: x t (2) = + 1 ˆ x t (1) + 2 xt + ... + p xt+2-p , etc. Forecasts with an Invertible MA(1) Process For this process, it holds that xt = + ut – ut-1 with | | < 1. The conditional mean of xt+ is Et[xt+ ] = + Et[ut+ ] – Et[ut+ -1] . For = 1, this leads to (2.59) ˆ x t (1) = – ut , and for 2, we get ˆ xt ( ) = , i.e. the unconditional mean is the optimal forecast of xt+ , = 2, 3, ..., . For the -step prediction errors and their variances we get:
  • 58. 84 Univariate Stationary Processes 2 ft(1) = ut+1, V[ft(1)] = 2 2 ft(2) = ut+2 – ut+1, V[ft(2)] = (1 + ) 2 2 ft( ) = ut+ – ut+ -1, V[ft( )] = (1 + ) . To be able to perform the one-step forecasts (2.59), the unobservable vari- able u has to be expressed as a function of the observable variable x. To do this, it must be taken into account that for s t, the one-step forecast errors can be written as (2.60) ˆ us = xs – x s 1 (1). For t = 0, we get from (2.59) ˆ x0 (1) = – u0 with the non-observable but fixed u0. Taking (2.60) into account, we get for t = 1 ˆ x1 (1) = – u1 = ˆ – (x1 – x 0 (1)) = – x1 + ( – u0) 2 = (1 + ) – x1 – u0 . Correspondingly, we get for t = 2 ˆ x 2 (1) = – u2 = – ˆ (x2 – x1 (1)) 2 = – x2 + ( (1 + ) – x1 – u0) 2 2 3 = (1 + + ) – x2 – x1 – u0 . If we continue this procedure, the so-called backcasting, we finally arrive at a representation of the one-step prediction which – except for u0 – con- sists only of observable terms, ˆ x t (1) = (1 + + ... + t) – xt – 2 xt-1 – ... – t x1 – t+1 u0 . Due to the invertibility of the MA(1) process, i.e. for | | < 1, the impact of the unknown initial value u0 finally disappears. Similarly, one can show that, after q forecast steps, the optimal forecasts of invertible MA(q) processes, q > 1 are equal to the unconditional mean of the process and that the variance of the forecast errors is equal to the variance of the underlying process. The forecasts in observable terms are represented similarly to those of the MA(1) process.
  • 59. 2.4 Forecasting 85 Forecasts with ARMA(p,q) Processes Forecasts for these processes result from combining the approaches of pure AR and MA processes. Thus, the one-step ahead forecast for a stationary and invertible ARMA(1,1) process is given by ˆ x t (1) = + xt – ut. Starting with t = 0 and taking (2.60) into account, forecasts are successive- ly generated by backcasting. We first get ˆ x0 (1) = + x0 – u0, where x0 and u0 are assumed to be any fixed numbers. For t = 1 we get ˆ x1 (1) = + x1 – u1 = + x1 – ˆ (x1 – x 0 (1)) 2 = (1 + ) + ( – ) x1 + x0 – u0 , which finally leads to (2.61) ˆ x t (1) = (1 + + ... + t) + ( – ) xt + ( – ) xt-1 + ... t-1 t t+1 + ( – ) x1 + x0 – u0 . Due to the invertibility condition, i.e. for | | < 1, the one-step forecast for large values of t does no longer depend on the unknown initial values x0 and u0. For the -step forecast, = 2, 3, ..., we get ˆ x t (2) = + ˆ x t (1) ˆ x t (3) = + ˆ x t (2) Using (2.61), these forecasts can be calculated recursively. 2.4.3 Evaluation of Forecasts Forecasts can be evaluated ex post, i.e. when the realised values are avail- able. There are many kinds of measures to do this. Quite often, only graphs and/or scatter diagrams of the predicted values and the corresponding ob- served values of a time series are plotted. Intuitively, a forecast is good’ if the predicted values describe the development of the series in the graphs relatively well or if the points in the scatter diagram are concentrated around the angle bisecting line in the first and/or third quadrant. Such intu-
  • 60. 86 Univariate Stationary Processes itive arguments are, however, not founded on the above-mentioned consid- erations on optimal predictions. For example, as (2.59) shows, the optimal one-step forecast of a MA(1) process is a pure random process. This im- plies that the graphs compare two quite different processes. Conclusion 6 given above states that the following decomposition holds for the vari- ances of the data generating processes, the forecasts and the forecast er- rors, ˆ V[xt+ ] = V[ x t ( )] + V[ft( )] . Thus, it is obvious that predicted and realised values are generally generat- ed by different processes. As a result, a measure for the predictability of stationary processes can be developed. It is defined as follows, ˆ V[x t ( )] V[f t ( )] (2.62) P( )2 = = 1 – , V[x t ] V[x t ] with 0 P( )2 1. At the same time, P( )2 is the correlation coefficient be- tween the predicted and the realised values of x. The optimal forecast of a pure random process with mean zero is x t ( ) = 0, i.e. P( )2 = 0. Such a ˆ process cannot be predicted. On the other hand, for the one-step forecast of a MA(1) process, we can write 2 2 2 P(1)2 = 2 2 = 2 > 0. (1 ) 1 However, the decomposition (2.58), theoretically valid for optimal fore- casts, does not hold for actual (empirical) forecasts, even if they are gener- ated by using (estimated) ARMA processes. This is due to the fact that forecast errors are hardly ever totally uncorrelated with the forecasts. Therefore, the value of P( )2 might even become negative for bad’ fore- casts. JACOB MINCER and VICTOR ZARNOWITZ (1969) made the following suggestion to check the consistency of forecasts. By using OLS the follow- ing regression equation is estimated (2.63) xt+ ˆ = a0 + a1 x t ( ) + t+ . It is tested either individually with t tests or commonly with a F test whether a0 = 0 and a1 = 1. If this is fulfilled, the forecasts are said to be consistent. However, such a regression produces consistent estimates of ˆ the parameters if and only if x t ( ) and t+ are asymptotically uncorrelated.
  • 61. 2.4 Forecasting 87 Moreover, to get consistent estimates of the variances, which is necessary for the validity of the test results, the residuals have to be pure random processes. Even under the null hypothesis of optimal forecasts, this only holds for one-step predictions. Thus, the usual F and t tests can only be used for = 1. For > 1, the MA( -1) process of the forecast errors has to be taken into account when the variances are estimated. A procedure for such situations combines Ordinary Least Squares for the estimation of the parameters and Generalised Least Squares for the estimation of the vari- ances, as proposed by BRYAN W. BROWN and SHLOMO MAITAL (1981). JINOOK JEONG and GANGADHARRAO S. MADDALA (1991) have pointed out another problem which is related to these tests. Even rational forecasts are usually not without errors; they contain measurement errors. This im- plies, however, that (2.63) cannot be estimated consistently with OLS; an instrumental variables estimator must be used. An alternative to the esti- mation of (2.63) is therefore to estimate a univariate MA( -1) model for the forecast errors of a -step prediction, ˆ f t( ) = a0 + ut + a1 ut-1 + a2 ut-2 + ... + a -1 ut- +1 , and to check the null hypothesis H0: a0 = 0 and whether the estimated re- siduals ût are white noise. On the other hand, simple descriptive measures, which are often em- ployed to evaluate the performance of forecasts, are based on the average values of the forecast errors over the forecast horizon. The simple arithme- tic mean indicates whether the values of the variable are – on average – over- or underestimated. However, the disadvantage of this measure is that large over- and underestimates cancel each other out. The mean absolute error is often used to avoid this effect. Starting the forecasts from a fixed point of time, t0, and assuming that realisations are available up to t0+m, we get m 1 MAE( ) = f t0 j ( ) , = 1, 2, ... . m 1 j 0 Every forecast error gets the same weight in this measure. The root mean square error is often used to give particularly large errors a stronger weight: m 1 RMSE( ) = f t2 j ( ) , = 1, 2, ... . m 1 j 0 0 These measures are not normalised, i.e. their size depends on the scale of the data.
  • 62. 88 Univariate Stationary Processes The inequality measure proposed by HENRY THEIL (1961) avoids this problem by comparing the actual forecasts with so-called naïve forecasts, i.e. the realised values of the last available observation, m f t2 j ( ) 0 j 0 U( ) = m , = 1, 2, ... . (x t 0 j x t0 j )2 j 0 ˆ If U( ) = 1, the forecast is as good as the naïve forecast, x t ( ) = xt. For U( ) < 1 the forecasts perform better than the naïve one. MAE, RMSE and Theil’s U all become zero if predicted and realised values are identical over the whole forecast horizon. Example 2.16 All these measures can also be applied to forecasts which are not generated by ARMA models, as, for example, the forecasts of the Council of Economic Experts or the Association of German Economic Research Institutes. Since the end of the 1960’s, both institutions have published forecasts of the German economic devel- opment for the following year, the institutes usually in October and the Council at the end of November. HANNS MARTIN HAGEN and GEBHARD KIRCHGÄSSNER (1996) investigated the annual forecasts of the growth rates of GNP for the period from 1970 to 1995 as well as for the sub-periods from 1970 to 1982 and from 1983 to 1995. These periods correspond to the social-liberal government of SPD and FDP and the conservative-liberal government of CDU/CSU and FDP. The results are given in Table 2.2. Besides the criteria given above, the table al- so indicates the square of the correlation coefficient between realised and predict- ed values (R2), the estimated regression coefficient â1 of the test equation (2.63) as well as the mean error (ME). According to almost all criteria, the forecasts of the Council outperform those of the institutes. This was to be expected, as the Coun- cil’s forecasts are produced slightly later, at a time when more information is available. It holds for the forecasts of both institutions that the mean absolute er- ror, the root mean squared error as well as Theil's U are smaller in the second pe- riod compared to the first one. This is some evidence that the forecasts might have improved over time. On the other hand, the correlation coefficient between pre- dicted and realised values has also become smaller. This indicates a deterioration of the forecasts. It has to be taken into account that the variance of the variable to be predicted was considerably smaller in the second period as compared to the first one. Thus, the smaller errors do not necessarily indicate improvements of the forecasts. It is also interesting to note that on average the forecast errors of both institutions were negative in the first and positive in the second sub-period. They tended to overestimate the development in the period of the social-liberal coalition and to underestimate it in the period of the conservative-liberal coalition.
  • 63. 2.5 The Relation between Econometric Models and ARMA Processes 89 Table 2.2: Forecasts of the Council of Economic Experts and of the Economic Research Institutes Period R2 RMSE MAE ME â1 U 1970 – 1995 0.369 1.838 1.346 -0.250* 1.005* 0.572 Institutes 1970 – 1982 0.429 2.291 1.654 -0.731 1.193* 0.625 1983 – 1995 0.399 1.229 1.038 0.231 1.081 0.457 1970 – 1995 0.502* 1.647* 1.171* -0.256 1.114 0.512* Council of Economic 1970 – 1982 0.599* 2.025* 1.477* -0.723* 1.354 0.552* Experts 1983 – 1995 0.472* 1.150* 0.865* 0.212* 1.036* 0.428* ‘*’ denotes the ‘better’ of the two forecasts. 2.5 The Relation between Econometric Models and ARMA Processes The ARMA model-based forecasts discussed in the previous section are unconditional forecasts. The only information that is used to generate the- se forecasts is the information contained in the current and past values of the time series. There is demand for such forecasts, and – as mentioned above – one of the reasons for the development and the popularity of the Box-Jenkins methodology presented in this chapter is that by applying the above-mentioned approaches, these predictions perform – at least partly – much better than forecasts generated by large scale econometric models. Thus, the Box-Jenkins methodology seems to be a (possibly much better) alternative to the traditional econometric methodology. However, this perspective is rather restricted. On the one hand, condi- tional rather than unconditional forecasts are required in many cases, for example, in order to evaluate the effect of a tax reform on economic growth. Such forecasts cannot be generated by using (only) univariate models. On the other hand, and more importantly, the separation of the two approaches is much less strict than it seems to be at first glance. As ARNOLD ZELLNER and FRANZ C. PALM (1974) showed, linear dynamic simultaneous equation systems as used in traditional econometrics can be transformed into ARMA models. (Inversely, multivariate time series mod- els as discussed in the next chapters can be transformed into traditional econometric models.) The univariate ARMA models correspond to the fi-
  • 64. 90 Univariate Stationary Processes nal equations of econometric models in the terminology of JAN TINBER- GEN (1940). Let us consider a very simple model. An exogenous, weakly stationary variable x, as defined in (2.64b), has a current and lagged impact on the dependent variable y, while the error term might be autocorrelated. Thus, we get the model (2.64a) yt = 1(L) xt + 2(L) u1,t , (2.64b) (L) xt = (L) u2,t , where 1(L) and 2(L) are lag polynomials of finite order. If we insert (2.64b) in (2.64a), we get for y the univariate model (2.64a') (L) yt = (L) vt with (L) vt := 1(L) (L) u2,t + 2(L) (L) u1,t . As (L)vt is an MA process of finite order, we get a finite order ARMA representation for y. It must be pointed out that the univariate representa- tions of the two variables have the same finite order AR term. References Since the time when HERMAN WOLD developed the class of ARMA processes in his dissertation and GEORGE E.P. BOX and GWILYM M. JENKINS (1970) popular- ised and further developed this model class in the textbook mentioned above, there have been quite a lot of textbooks dealing with these models at different technical levels. An introduction focusing on empirical applications is, for example, to be found in ROBERT S. PINDYCK and DANIEL L. RUBINFELD, Econometric Models and Eco- nomic Forecasts, McGraw-Hill, Boston et al., 4th edition 1998, Chapter 17f. pp. 521 – 578, PETER J. BROCKWELL and RICHARD A. DAVIS, Introduction to Time Series and Forecasting, Springer, New York et al. 1996, as well as TERENCE C. MILLS, Time Series Techniques for Economists, Cambridge Universi- ty Press, Cambridge (England) 1990. Contrary to this, PETER J. BROCKWELL and RICHARD A. DAVIS, Time Series: Theory and Methods, Springer, New York et al. 1987, give a rigorous presentation in probability theory. Along with the respective proofs of the theorems, this textbook shows, however, many empirical examples.
  • 65. References 91 Autoregressive processes for the residuals of an estimated regression equation were used for the first time in econometrics by DONALD COCHRANE and GUY H. ORCUTT, Application of Least Squares Regres- sion to Relationships Containing Autocorrelated Error Terms, Journal of the American Statistical Association 44 (1949), pp. 32 – 61. The different information criteria to detect the order of an autoregressive process are presented in HIROTUGU AKAIKE, Fitting Autoregressive Models for Prediction, Annals of the Institute of Statistical Mathematics AC-19 (1974), pp. 364 – 385, HIROTUGU AKAIKE, A New Look at the Statistical Model Identification, IEEE Transactions on Automatic Control 21 (1969), pp. 234 – 237, GIDEON SCHWARZ, Estimating the Dimensions of a Model, Annals of Statistics 6 (1978), pp. 461 – 464, as well as in EDWARD J. HANNAN and BARRY G. QUINN, The Determination of the Order of an Autoregression, Journal of the Royal Statistical Society B 41 (1979), pp. 190 – 195. The effect of temporal aggregation on the first differences of temporal averages have first been investigated by HOLBROOK WORKING, Note on the Correlation of First Differences of Averages in a Random Chain, Econometrica 28 (1960), pp. 916 – 918 and later on, in more detail, by GEORGE C. TIAO, Asymptotic Behaviour of Temporal Aggregates of Time Series, Biometrika 59 (1972), pp. 525 – 531. The approach to check the consistency of predictions was developed by JACOB MINCER and VICTOR ZARNOWITZ, The Evaluation of Economic Forecasts, in: J. MINCER (ed.), Economic Forecasts and Expectations, National Bureau of Economic Research, New York 1969. The use of MA processes of the forecast errors to estimate the variances of the es- timated parameters was presented by BRYAN W. BROWN and SHLOMO MAITAL, What Do Economists Know? An Em- pirical Study of Experts’ Expectations, Econometrica 49 (1981), pp. 491 – 504. The fact that measurement errors also play a role in rational forecasts and that, therefore, instrumental variable estimators should be used, was indicated by JINOOK JEONG and GANGADHARRAO S. MADDALA, Measurement Errors and Tests for Rationality, Journal of Business and Economic Statistics 9 (1991), pp. 431 – 439.
  • 66. 92 Univariate Stationary Processes These procedures have been applied to the common forecasts of the German eco- nomic research institutes by GEBHARD KIRCHGÄSSNER, Testing Weak Rationality of Forecasts with Different Time Horizons, Journal of Forecasting 12 (1993), pp. 541 – 558. Moreover, the forecasts of the German Council of Economic Experts as well as those of the German Economic Research Institutes were investigated in HANNS MARTIN HAGEN and GEBHARD KIRCHGÄSSNER, Interest Rate Based Fore- casts of German Economic Growth: A Note, Weltwirtschaftliches Archiv 132 (1996), pp. 763 – 773. The measure of inequality (Theil’s U) was proposed by HENRY THEIL, Economic Forecasts and Policy, North-Holland, Amsterdam 1961. An alternative measure is given in HENRY THEIL, Applied Economic Forecasting, North-Holland, Amsterdam 1966. Today, both measures are used in computer programmes. Quite generally, fore- casts for time series data are discussed in CLIVE W.J. GRANGER, Forecasting in Business and Economics, Academic Press, 2nd edition 1989. On the evaluation of the predictive accuracy of forecasts see FRANCIS X. DIEBOLD and ROBERTO S. MARIANO, Comparing Predictive Accuracy, Journal of Business and Economic Statistics 13 (1995), pp. 253 – 263. The relationship between time series models and econometric equation sys- tems is analysed in ARNOLD ZELLNER and FRANZ C. PALM, Time Series Analysis and Simultaneous Equation Econometric Models, Journal of Econometrics 2 (1974), pp. 17 – 54. See for this also FRANZ C. PALM, Structural Econometric Modeling and Time Series Analysis: An Integrated Approach, in: A. ZELLNER (ed.), Applied Time Series Analysis of Economic Data, U.S. Department of Commerce, Economic Research Report ER-S, Washington 1983, pp. 199 – 230. The term final equation originates from JAN TINBERGEN, Econometric Business Cycle Research, Review of Economic Studies 7 (1940), pp. 73 – 90. An introduction into the solution of difference equations is given in WALTER ENDERS, Applied Econometric Time Series, 3rd edition, Wiley, Hoboken, N.J. 2010, Chapter 1.
  • 67. References 93 The permanent income hypothesis as a determinant of consumption expenditure was developed by MILTON FRIEDMAN, A Theory of the Consumption Function, Princeton University Press, Princeton N.J. 1957. The example of the estimated popularity function is given in GEBHARD KIRCHGÄSSNER, Causality Testing of the Popularity Function: An Em- pirical Investigation for the Federal Republic of Germany, 1971 – 1982, Pub- lic Choice 45 (1985), pp. 155 – 173.