Time-Series-Analysis-with-Statsmodels - Chapter 3

Time-Series-Analysis-with-Statsmodels - Chapter 3

Introduction to Statsmodels

Statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. An extensive list of result statistics are available for each estimator. The results are tested against existing statistical packages to ensure that they are correct. The package is released under the open source Modified BSD (3-clause) license. The online documentation is hosted at statsmodels.org.

For better understanding of different codes and capabilities, I will be using the following dataset for demo purpose:

US Macroeconomic Data for 1959Q1 - 2009Q3
Number of Observations - 203
Number of Variables - 14
Variable name definitions:
    year      - 1959q1 - 2009q3
    quarter   - 1-4
    realgdp   - Real gross domestic product (Bil. of chained 2005 US$,
                seasonally adjusted annual rate)
    realcons  - Real personal consumption expenditures (Bil. of chained
                2005 US$, seasonally adjusted annual rate)
    realinv   - Real gross private domestic investment (Bil. of chained
                2005 US$, seasonally adjusted annual rate)
    realgovt  - Real federal consumption expenditures & gross investment
                (Bil. of chained 2005 US$, seasonally adjusted annual rate)
    realdpi   - Real private disposable income (Bil. of chained 2005
                US$, seasonally adjusted annual rate)
    cpi       - End of the quarter consumer price index for all urban
                consumers: all items (1982-84 = 100, seasonally adjusted).
    m1        - End of the quarter M1 nominal money stock (Seasonally
                adjusted)
    tbilrate  - Quarterly monthly average of the monthly 3-month
                treasury bill: secondary market rate
    unemp     - Seasonally adjusted unemployment rate (%)
    pop       - End of the quarter total population: all ages incl. armed
                forces over seas
    infl      - Inflation rate (ln(cpi_{t}/cpi_{t-1}) * 400)
    realint   - Real interest rate (tbilrate - infl)

you can also build this DataFrame with the following code:

    import pandas as pd
    import statsmodels.api as sm
    df = sm.datasets.macrodata.load_pandas().data
    df.index = pd.Index(sm.tsa.datetools.dates_from_range('1959Q1', '2009Q3'))
    print(sm.datasets.macrodata.NOTE)        


Article content



Article content
Article content
Article content

ETS

Error/Trend/Seasonality Models

As we begin working with endogenous data ("endog" for short) and start to develop forecasting models, it helps to identify and isolate factors working within the system that influence behavior. Here the name "endogenous" considers internal factors, while "exogenous" would relate to external forces. These fall under the category of state space models, and include decomposition (described below), and exponential smoothing (described in an upcoming section).

The decomposition of a time series attempts to isolate individual components such as error, trend, and seasonality (ETS). There we separated data into a trendline and a cyclical feature that mapped observed data back to the trend.

Related Function:

statsmodels.tsa.seasonal.seasonal_decompose(x, model)  


Seasonal Decomposition

Statsmodels provides a seasonal decomposition tool we can use to separate out the different components. This lets us see quickly and visually what each component contributes to the overall behavior.

We apply an additive model when it seems that the trend is more linear and the seasonality and trend components seem to be constant over time (e.g. every year we add 10,000 passengers). A multiplicative model is more appropriate when we are increasing (or decreasing) at a non-linear rate (e.g. each year we double the amount of passengers).

For these examples we'll use the International Airline Passengers dataset, which gives monthly totals in thousands from January 1949 to December 1960.

Article content


Article content

Moving Averages

Simple Moving Average

We can create a SMA by applying a mean function to a rolling window.


Article content

Article content


Article content


Article content


Article content


Article content


Article content


Holt-Winters Methods

In the previous section on Exponentially Weighted Moving Averages (EWMA) we applied Simple Exponential Smoothing using just one smoothing factor α (alpha). This failed to account for other contributing factors like trend and seasonality.

In this section we'll look at Double and Triple Exponential Smoothing with the Holt-Winters Methods.

In Double Exponential Smoothing (aka Holt's Method) we introduce a new smoothing factor β (beta) that addresses trend.

Because we haven't yet considered seasonal fluctuations, the forecasting model is simply a straight sloped line extending from the most recent data point.

With Triple Exponential Smoothing (aka the Holt-Winters Method) we introduce a smoothing factor γ (gamma) that addresses seasonality:


Article content


Article content


Article content


Article content


Article content


Article content
NOTE:


Article content


Article content


Article content


Article content


Article content


Article content


Article content


To view or add a comment, sign in

Others also viewed

Explore topics