Time-Series-Analysis-with-Statsmodels

Introduction to Statsmodels

Statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. An extensive list of result statistics are available for each estimator. The results are tested against existing statistical packages to ensure that they are correct. The package is released under the open source Modified BSD (3-clause) license. The online documentation is hosted at statsmodels.org.

For better understanding of different codes and capabilities, I will be using the following dataset for demo purpose:

US Macroeconomic Data for 1959Q1 - 2009Q3
Number of Observations - 203
Number of Variables - 14
Variable name definitions:
    year      - 1959q1 - 2009q3
    quarter   - 1-4
    realgdp   - Real gross domestic product (Bil. of chained 2005 US$,
                seasonally adjusted annual rate)
    realcons  - Real personal consumption expenditures (Bil. of chained
                2005 US$, seasonally adjusted annual rate)
    realinv   - Real gross private domestic investment (Bil. of chained
                2005 US$, seasonally adjusted annual rate)
    realgovt  - Real federal consumption expenditures & gross investment
                (Bil. of chained 2005 US$, seasonally adjusted annual rate)
    realdpi   - Real private disposable income (Bil. of chained 2005
                US$, seasonally adjusted annual rate)
    cpi       - End of the quarter consumer price index for all urban
                consumers: all items (1982-84 = 100, seasonally adjusted).
    m1        - End of the quarter M1 nominal money stock (Seasonally
                adjusted)
    tbilrate  - Quarterly monthly average of the monthly 3-month
                treasury bill: secondary market rate
    unemp     - Seasonally adjusted unemployment rate (%)
    pop       - End of the quarter total population: all ages incl. armed
                forces over seas
    infl      - Inflation rate (ln(cpi_{t}/cpi_{t-1}) * 400)
    realint   - Real interest rate (tbilrate - infl)

you can also build this DataFrame with the following code:

    import pandas as pd
    import statsmodels.api as sm
    df = sm.datasets.macrodata.load_pandas().data
    df.index = pd.Index(sm.tsa.datetools.dates_from_range('1959Q1', '2009Q3'))
    print(sm.datasets.macrodata.NOTE)

ETS

Error/Trend/Seasonality Models

As we begin working with endogenous data ("endog" for short) and start to develop forecasting models, it helps to identify and isolate factors working within the system that influence behavior. Here the name "endogenous" considers internal factors, while "exogenous" would relate to external forces. These fall under the category of state space models, and include decomposition (described below), and exponential smoothing (described in an upcoming section).

The decomposition of a time series attempts to isolate individual components such as error, trend, and seasonality (ETS). There we separated data into a trendline and a cyclical feature that mapped observed data back to the trend.

Related Function:

statsmodels.tsa.seasonal.seasonal_decompose(x, model)

Seasonal Decomposition

Statsmodels provides a seasonal decomposition tool we can use to separate out the different components. This lets us see quickly and visually what each component contributes to the overall behavior.

We apply an additive model when it seems that the trend is more linear and the seasonality and trend components seem to be constant over time (e.g. every year we add 10,000 passengers). A multiplicative model is more appropriate when we are increasing (or decreasing) at a non-linear rate (e.g. each year we double the amount of passengers).

For these examples we'll use the International Airline Passengers dataset, which gives monthly totals in thousands from January 1949 to December 1960.

Moving Averages

Simple Moving Average

We can create a SMA by applying a mean function to a rolling window.

Holt-Winters Methods

In the previous section on Exponentially Weighted Moving Averages (EWMA) we applied Simple Exponential Smoothing using just one smoothing factor α (alpha). This failed to account for other contributing factors like trend and seasonality.

In this section we'll look at Double and Triple Exponential Smoothing with the Holt-Winters Methods.

In Double Exponential Smoothing (aka Holt's Method) we introduce a new smoothing factor β (beta) that addresses trend.

Because we haven't yet considered seasonal fluctuations, the forecasting model is simply a straight sloped line extending from the most recent data point.

With Triple Exponential Smoothing (aka the Holt-Winters Method) we introduce a smoothing factor γ (gamma) that addresses seasonality:

Time-Series-Analysis-with-Statsmodels - Chapter 3

Junaid .

Data Scientist at Trinity Life Sciences - Generative AI Engineer -Applied AI

Introduction to Statsmodels

ETS

Error/Trend/Seasonality Models

Related Function:

Seasonal Decomposition

Moving Averages

Simple Moving Average

Holt-Winters Methods

More articles by this author

Others also viewed

Seaborn: Elevating Data Visualization in Python

Matplotlib

Mastering Data Visualization with Matplotlib: A Comprehensive Guide to Creating Powerful Plots and Charts

Polars Vs Pandas: Benchmarking performances and beyond

The T-test!

A Data Science Framework: To Achieve 99% Accuracy using Python

Change the data type of columns in Pandas

New Algorithm and Matplotlib/Plotly Questions

Time Series Episode 6: Battle of forecasting algorithms in “Darts”

A complete Exploratory Data Analysis guide with Python

Explore topics

Introduction to Statsmodels

ETS

Error/Trend/Seasonality Models

Related Function:

Seasonal Decomposition

Moving Averages

Simple Moving Average

Holt-Winters Methods

Basic Terminologies in Time Series Forecasting - Chapter 2

Jan 5, 2025

Basic Terminologies in Time Series Forecasting

Jan 5, 2025

Others also viewed

Seaborn: Elevating Data Visualization in Python

Matplotlib

Mastering Data Visualization with Matplotlib: A Comprehensive Guide to Creating Powerful Plots and Charts

Polars Vs Pandas: Benchmarking performances and beyond

The T-test!

A Data Science Framework: To Achieve 99% Accuracy using Python

Change the data type of columns in Pandas

New Algorithm and Matplotlib/Plotly Questions

Time Series Episode 6: Battle of forecasting algorithms in “Darts”

A complete Exploratory Data Analysis guide with Python

Explore topics