Time Series Episode 6: Battle of forecasting algorithms in “Darts”
Introduction
Hi there! Happy to see you again in this series of articles, where we discuss about Time Series, theory and examples.
In the previous articles we discussed about ARIMA-family models for forecasting and working examples of how to apply them, based on my experience from multiple projects so far.
In the last article, though, I presented the “Darts” library in Python which contains lots of forecasting algorithms, making it very easy to do feature engineering and apply and compare easily whatever you want, as well as evaluating results in the end (you can learn more here).
So, in this story I will work on a dataset that we had discussed about, but now trying different forecasting algorithms from Darts in order for you to get an understanding of how this library works and how to easily train 7 different models.
You ready? Let’s start!
Step-by-Step Working Example
We begin with the dataset described in my second hands-on tutorial on LinkedIn:
Time Series Episode 2: What happens with strong seasonality
After investigation of parameters and multiple iterations, we had arrived at this ARIMA model predicting the last 12 months (1 year):
SARIMA(2,1,2)(1,1,0,15)
Well, now that we will try more methods than ARIMA, I would suggest let’s make it more interesting and try to forecast the last 4 years (48 months), shall we??
Step 1: Read and transform the data
We begin by reading the data as “csv”, transforming it into a special “Timeseries” object required by Darts, plotting the series and checking for seasonality:
Darts provides lots and lots of useful of capabilities for data engineering and statistical tests, along with all the forecasting models.
The “check_seasonality” test tells us that the time series has seasonality of 124 months. This can be understood if you look closely to the plot, where you may see the same pattern repeating itself every around 10 years, so nearly 120 months.
Ok, we have a seasonality of around a decade. But is it only that? Usually, with monthly data, there is also a yearly seasonality. So we might have more Sunspots during summer, for example.
Or there might be seasonality within a decade, maybe with more Sunspots every two years. Who knows?
We can explore some cases:
Inspect seasonality from 1940 until 1941 (within a YEAR)
2. Inspect seasonality from 1950 until 1951 (within a YEAR)
3. Inspect seasonality from 1940 until 1950 (within a DECADE)
4. Inspect seasonality from 1950 until 1960 (within a DECADE)
Well, debatable results. Sometimes we have seasonality within a year, sometimes not. And the seasonality within a decade might be different than in an other decade (you can try more examples but that’s the general concept here).
So, the strongest seasonality we have is 124 months, that is also seen in Image 2, and we are going to use that in our models.
Step 2: Train multiple models
We start by setting the training and validation sets:
2.1 Simple Exponential Smoothing (SES), Theta and Linear Regression
Let’s start with SES, that considers most recent past data:
We also train a Theta model, which is similar to SES but applying the two theta lines to consider general seasonality and recent trend:
Although more suitable for tabular datasets, we also try Linear Regression by looking at some lags of the Time Series as the predictors.
As we said in the previous article, you should pay attention to the parameter “output_chunck_length” which is the number of time steps predicted at once (per chunk) by the internal model.
Basically, this means how many months are being predicted with every model training. So, in the following code snippet, in the training we use 124 lags (i.e. 124 months in the past) to predict the next 20 months at once (output_chunk_length = 20):
Now let’s plot the results and evaluate with MAPE (Mean Absolute Percentage Error):
All these models seem to work fine and very similarly.
2.2 Random Forest
Again a great algorithm for tabular data, considering lags as predictors.
And, as such, it doesn’t understand “order of the lags”, meaning that even if it considers e.g. lags 1,2,3,4 it doesn’t understand that lag 1 is more recent than lag 2, which is more recent than lag 3 etc. So you eliminate the sequential nature of the Time Series, as we had discussed in the previous article.
Let’s try, though, as model that considers 1 month, 12 months and 124 months ago to see how it behaves:
According to its MAPE, it behaves worse than the previous models. Of course, you can try with other configurations (hyperparameters or lags) here, but this is the general result.
2.3 RNN
We discussed and explained RNNs briefly in the previous article, so you can refer to that to get a brief understanding.
Two main parameters here:
training_length: number of data points (months) to be used for training. It is equal to the length of both input and output data points. Here we set it to 240, so 20 years in every training (2 decades, to capture seasonality).
input_chunk_length: the number of past data points (months) to be used for predictions. Here, the RNN will look back 120 months in the past from the moment of prediction, so it will take the full seasonal cycle of the last decade, to compute predictions for the next 4 years. If we increase the value, then we force the RNN to rely more on its long-term memory. However, this parameter should not exceed the previous one (training_length).
n_epochs: the number of epochs over which to train the model, set here at 200 to decrease the error (but don’t overdo it because the model will overfit).
batch_size: the number of time series (input and output sequences) used in each training pass, set here at 8.
So what are we doing? Take notes! 🤓
We train timeseries of length 240 and, in order to predict the next 48 months, we utilize the last 120 months of the dataset.
This means that all of the sub-timeseries that we will train, will be of length 240. And we have a training dataset of length 752. How is this all connected?
The 1st subset will be the first 240 data points: from index 0 to 239.
The next one will be from index 1 to 240 and so on.
The last one will be from index 751 (last index of the 752 dataset) — 240 = 511 until 751.
So, we have subsets of training data from index 0 until index 511, which gives us 512 sub-timeseries in total.
This means that, by setting batch_size=20, we will have 512/20=25.6 batches per epoch.
This also means that the first 25 batches will be full and we will need one extra batch for the remaining data, so 26 batches in total per epoch.
So, basically, in every epoch we train 26 batches. The first batch will have the first 20 timeseries out of the 512, and so on.
We also need to transform the data to scale of 0–1 for neural networks to work best, and we will inverse-transform the predictions.
But what if we can give information to the model about the month, as well?
Meaning, to create covariates that will be known when the model predicts the future months (named as “future covariates”). Darts gives us easily this ability. We will start by creating a timeseries of the year and then for the months as binary encoding. In the end we will use the covariates of the months only, because the year is not going to be repeated in the future, so we don’t need it:
This way, we are providing the model with information about which month it is (January? 0 or 1, etc) and we include it in the model training:
So it seems the information about months didn’t help the model, and the univariate timeseries RNN works better in this case.
2.4 TBATS
TBATS models stand for
Trigonometric components are used to model multiple seasonal patterns, especially non-integer and complex seasonalities.
Box-Cox transformation is applied to stabilize the variance in the data.
ARMA (AutoRegressive Moving Average) errors model captures short-term autocorrelations in the residuals.
Trend components (level and slope) capture long-term trends in the data.
Seasonal components can represent multiple overlapping seasonal cycles (e.g., daily and yearly).
They are appropriate to model complex seasonal time series such as those with multiple seasonal periods, high frequency seasonality, non-integer seasonality and dual-calendar effects.
Darts gives us the ability to provide different seasonalities, as well as to choose if we want to include Trend, ARMA errors and/or Box-Cox in the modeling. The model can try all combinations (include trend or not, include transformation or not, etc) and select the one with better performance:
Well, not working on this dataset 😅. Let’s go on with others!
2.5 N-BEATS / N-HiTS
Both models are using a neural network approach, designed for time series forecasting. They use a stack of fully connected layers (MLPs) that learn different aspects of the time series, such as trend and seasonality.
Starting with N-BEATS (standing for “Neural Basis Expansion Analysis Time Series”), it uses a stack of fully connected layers (MLPs) organized into blocks. Each block learns different aspects of the time series data, such as trends, seasonality, or other patterns.
It is also highly interpretable, through its basis expansion approach, where each block can be designed to focus on specific components of the time series, making the outputs understandable.
The model makes forecasts by recursively predicting future values, allowing it to forecast multiple steps ahead effectively.
Now, N-HiTS (“Neural Hierarchical Interpolation for Time Series”) is an improvement over N-BEATS, that introduces a hierarchical decomposition strategy, where forecasts are generated at different scales or resolutions. This helps in capturing both short-term and long-term patterns in the data more effectively.
It uses a combination of interpolation and extrapolation techniques to better estimate future values of a time series. This approach allows N-HiTS to handle missing data and make robust forecasts over different time horizons.
Like N-BEATS, N-HiTS decomposes the time series into different components (such as trend, seasonality, and residuals), but does so in a more structured hierarchical manner, which helps capture the multi-scale nature of time series data. As such, it gets trained on fewer parameters and is subsequently faster than N-BEATS, usually with better performance as well.
Now, let’s see how it is implemented in Darts. We are going to train the model on 80 epochs (like with the RNNs) and forecast 20 values at once (that’s the “output_chunk_length”). For this, we will utilize the last 100 values of the training set (that’s the “input_chunk_length”), meaning 5 times the output.
All the rest hyperparameters (“num_stacks”, “num_blocks”, “num_layers” being the most important) will stay as default, as I got better performance with them overall after multiple iterations.
The validation set is larger than 20, so the first 20 predictions will be used to forecast the next ones, and so on.
Step 3: Evaluate results
Now we are able to compare all models and evaluate their performance on the same dataset:
Better results were given by RNN. They might not be the best, but don’t forget we are forecasting 4 years ahead! Uncertainty is here..
Conclusion
In this article we experimented with multiple models from the Darts library in Python, trying to forecast a simple Time Series dataset.
The main goal of this article was to get you familiar with Darts, some of its well-known models and their configuration, instead of trying to optimize the results. With more model configuration, we could improve performance even further!
Some things to keep in mind before you go:
Darts provides an easy-to-use framework to transform your Time Series and apply multiple models, ranging from statistical ones up to neural networks.
Hyperparameters like training_length, input/output_chunk_length can heavily affect the training and prediction phases, because they define how many data points are considered to predict some other data points. So, please don’t neglect them.
Covariates don’t always help.
Results depend on both model and nature of the data.
I am also using this library in my day-to-day work for Time Series projects and exploring its potential for data handling and modeling, so I am learning with you along the way!
In future articles we will explore Darts with external covariates to understand how to better utilize them and improve our modeling performance.
You can also read my original article published on Medium by Python's Gurus :
Time Series Episode 6: Battle of forecasting algorithms in “Darts”
Thanks for reading!