Time Series Episode 6: Battle of forecasting algorithms in “Darts”

Vasilis Kalyvas

Senior Data Scientist at Coca-Cola HBC | AI/ML tutorials

Published Oct 8, 2024

Introduction

Hi there! Happy to see you again in this series of articles, where we discuss about Time Series, theory and examples.

In the previous articles we discussed about ARIMA-family models for forecasting and working examples of how to apply them, based on my experience from multiple projects so far.

In the last article, though, I presented the “Darts” library in Python which contains lots of forecasting algorithms, making it very easy to do feature engineering and apply and compare easily whatever you want, as well as evaluating results in the end (you can learn more here).

So, in this story I will work on a dataset that we had discussed about, but now trying different forecasting algorithms from Darts in order for you to get an understanding of how this library works and how to easily train 7 different models.

You ready? Let’s start!

Step-by-Step Working Example

We begin with the dataset described in my second hands-on tutorial on LinkedIn:

Time Series Episode 2: What happens with strong seasonality

After investigation of parameters and multiple iterations, we had arrived at this ARIMA model predicting the last 12 months (1 year):

SARIMA(2,1,2)(1,1,0,15)

Image 1 — Our previous forecasts on Sunspots dataset

Well, now that we will try more methods than ARIMA, I would suggest let’s make it more interesting and try to forecast the last 4 years (48 months), shall we??

Step 1: Read and transform the data

We begin by reading the data as “csv”, transforming it into a special “Timeseries” object required by Darts, plotting the series and checking for seasonality:

Darts provides lots and lots of useful of capabilities for data engineering and statistical tests, along with all the forecasting models.

The “check_seasonality” test tells us that the time series has seasonality of 124 months. This can be understood if you look closely to the plot, where you may see the same pattern repeating itself every around 10 years, so nearly 120 months.

Ok, we have a seasonality of around a decade. But is it only that? Usually, with monthly data, there is also a yearly seasonality. So we might have more Sunspots during summer, for example.

Or there might be seasonality within a decade, maybe with more Sunspots every two years. Who knows?

We can explore some cases:

Inspect seasonality from 1940 until 1941 (within a YEAR)

Image 3 — Seasonality of 5 months within 1940–1941

2. Inspect seasonality from 1950 until 1951 (within a YEAR)

Image 4 — No seasonality within 1950–1951

3. Inspect seasonality from 1940 until 1950 (within a DECADE)

Image 5 — No seasonality within 1950–1951

4. Inspect seasonality from 1950 until 1960 (within a DECADE)

Image 6 — Seasonality of 3 months within 1950–1960

Well, debatable results. Sometimes we have seasonality within a year, sometimes not. And the seasonality within a decade might be different than in an other decade (you can try more examples but that’s the general concept here).

So, the strongest seasonality we have is 124 months, that is also seen in Image 2, and we are going to use that in our models.

Step 2: Train multiple models

We start by setting the training and validation sets:

2.1 Simple Exponential Smoothing (SES), Theta and Linear Regression

Let’s start with SES, that considers most recent past data:

We also train a Theta model, which is similar to SES but applying the two theta lines to consider general seasonality and recent trend:

Although more suitable for tabular datasets, we also try Linear Regression by looking at some lags of the Time Series as the predictors.

As we said in the previous article, you should pay attention to the parameter “output_chunck_length” which is the number of time steps predicted at once (per chunk) by the internal model.

Basically, this means how many months are being predicted with every model training. So, in the following code snippet, in the training we use 124 lags (i.e. 124 months in the past) to predict the next 20 months at once (output_chunk_length = 20):

Now let’s plot the results and evaluate with MAPE (Mean Absolute Percentage Error):

Image 7 — Results of SES, Theta, Linear Regression

All these models seem to work fine and very similarly.

2.2 Random Forest

Again a great algorithm for tabular data, considering lags as predictors.

And, as such, it doesn’t understand “order of the lags”, meaning that even if it considers e.g. lags 1,2,3,4 it doesn’t understand that lag 1 is more recent than lag 2, which is more recent than lag 3 etc. So you eliminate the sequential nature of the Time Series, as we had discussed in the previous article.

Let’s try, though, as model that considers 1 month, 12 months and 124 months ago to see how it behaves:

According to its MAPE, it behaves worse than the previous models. Of course, you can try with other configurations (hyperparameters or lags) here, but this is the general result.

2.3 RNN

We discussed and explained RNNs briefly in the previous article, so you can refer to that to get a brief understanding.

Two main parameters here:

training_length: number of data points (months) to be used for training. It is equal to the length of both input and output data points. Here we set it to 240, so 20 years in every training (2 decades, to capture seasonality).
input_chunk_length: the number of past data points (months) to be used for predictions. Here, the RNN will look back 120 months in the past from the moment of prediction, so it will take the full seasonal cycle of the last decade, to compute predictions for the next 4 years. If we increase the value, then we force the RNN to rely more on its long-term memory. However, this parameter should not exceed the previous one (training_length).
n_epochs: the number of epochs over which to train the model, set here at 200 to decrease the error (but don’t overdo it because the model will overfit).
batch_size: the number of time series (input and output sequences) used in each training pass, set here at 8.

So what are we doing? Take notes! 🤓

We train timeseries of length 240 and, in order to predict the next 48 months, we utilize the last 120 months of the dataset.

This means that all of the sub-timeseries that we will train, will be of length 240. And we have a training dataset of length 752. How is this all connected?

The 1st subset will be the first 240 data points: from index 0 to 239.
The next one will be from index 1 to 240 and so on.
The last one will be from index 751 (last index of the 752 dataset) — 240 = 511 until 751.
So, we have subsets of training data from index 0 until index 511, which gives us 512 sub-timeseries in total.
This means that, by setting batch_size=20, we will have 512/20=25.6 batches per epoch.
This also means that the first 25 batches will be full and we will need one extra batch for the remaining data, so 26 batches in total per epoch.

Image 10 (created by Author) — Description of the batches and their subsets of timeseries

So, basically, in every epoch we train 26 batches. The first batch will have the first 20 timeseries out of the 512, and so on.

We also need to transform the data to scale of 0–1 for neural networks to work best, and we will inverse-transform the predictions.

Image 11 (created by Author) — Results of RNN

But what if we can give information to the model about the month, as well?

Meaning, to create covariates that will be known when the model predicts the future months (named as “future covariates”). Darts gives us easily this ability. We will start by creating a timeseries of the year and then for the months as binary encoding. In the end we will use the covariates of the months only, because the year is not going to be repeated in the future, so we don’t need it:

This way, we are providing the model with information about which month it is (January? 0 or 1, etc) and we include it in the model training:

Image 12 — Results of RNNs with future covariates

So it seems the information about months didn’t help the model, and the univariate timeseries RNN works better in this case.

2.4 TBATS

TBATS models stand for

Trigonometric components are used to model multiple seasonal patterns, especially non-integer and complex seasonalities.
Box-Cox transformation is applied to stabilize the variance in the data.
ARMA (AutoRegressive Moving Average) errors model captures short-term autocorrelations in the residuals.
Trend components (level and slope) capture long-term trends in the data.
Seasonal components can represent multiple overlapping seasonal cycles (e.g., daily and yearly).

They are appropriate to model complex seasonal time series such as those with multiple seasonal periods, high frequency seasonality, non-integer seasonality and dual-calendar effects.

Darts gives us the ability to provide different seasonalities, as well as to choose if we want to include Trend, ARMA errors and/or Box-Cox in the modeling. The model can try all combinations (include trend or not, include transformation or not, etc) and select the one with better performance:

Well, not working on this dataset 😅. Let’s go on with others!

2.5 N-BEATS / N-HiTS

Both models are using a neural network approach, designed for time series forecasting. They use a stack of fully connected layers (MLPs) that learn different aspects of the time series, such as trend and seasonality.

Starting with N-BEATS (standing for “Neural Basis Expansion Analysis Time Series”), it uses a stack of fully connected layers (MLPs) organized into blocks. Each block learns different aspects of the time series data, such as trends, seasonality, or other patterns.

It is also highly interpretable, through its basis expansion approach, where each block can be designed to focus on specific components of the time series, making the outputs understandable.

The model makes forecasts by recursively predicting future values, allowing it to forecast multiple steps ahead effectively.

Now, N-HiTS (“Neural Hierarchical Interpolation for Time Series”) is an improvement over N-BEATS, that introduces a hierarchical decomposition strategy, where forecasts are generated at different scales or resolutions. This helps in capturing both short-term and long-term patterns in the data more effectively.

It uses a combination of interpolation and extrapolation techniques to better estimate future values of a time series. This approach allows N-HiTS to handle missing data and make robust forecasts over different time horizons.

Like N-BEATS, N-HiTS decomposes the time series into different components (such as trend, seasonality, and residuals), but does so in a more structured hierarchical manner, which helps capture the multi-scale nature of time series data. As such, it gets trained on fewer parameters and is subsequently faster than N-BEATS, usually with better performance as well.

Now, let’s see how it is implemented in Darts. We are going to train the model on 80 epochs (like with the RNNs) and forecast 20 values at once (that’s the “output_chunk_length”). For this, we will utilize the last 100 values of the training set (that’s the “input_chunk_length”), meaning 5 times the output.

All the rest hyperparameters (“num_stacks”, “num_blocks”, “num_layers” being the most important) will stay as default, as I got better performance with them overall after multiple iterations.

The validation set is larger than 20, so the first 20 predictions will be used to forecast the next ones, and so on.

Step 3: Evaluate results

Now we are able to compare all models and evaluate their performance on the same dataset:

Better results were given by RNN. They might not be the best, but don’t forget we are forecasting 4 years ahead! Uncertainty is here..

Conclusion

In this article we experimented with multiple models from the Darts library in Python, trying to forecast a simple Time Series dataset.

The main goal of this article was to get you familiar with Darts, some of its well-known models and their configuration, instead of trying to optimize the results. With more model configuration, we could improve performance even further!

Some things to keep in mind before you go:

Darts provides an easy-to-use framework to transform your Time Series and apply multiple models, ranging from statistical ones up to neural networks.
Hyperparameters like training_length, input/output_chunk_length can heavily affect the training and prediction phases, because they define how many data points are considered to predict some other data points. So, please don’t neglect them.
Covariates don’t always help.
Results depend on both model and nature of the data.

I am also using this library in my day-to-day work for Time Series projects and exploring its potential for data handling and modeling, so I am learning with you along the way!

In future articles we will explore Darts with external covariates to understand how to better utilize them and improve our modeling performance.

You can also read my original article published on Medium by Python's Gurus :

Time Series Episode 6: Battle of forecasting algorithms in “Darts”

Thanks for reading!

Time Series Episode 6: Battle of forecasting algorithms in “Darts”

Vasilis Kalyvas

Senior Data Scientist at Coca-Cola HBC | AI/ML tutorials

Introduction

Step-by-Step Working Example

Step 1: Read and transform the data

Step 2: Train multiple models

2.1 Simple Exponential Smoothing (SES), Theta and Linear Regression

2.2 Random Forest

2.3 RNN

2.4 TBATS

2.5 N-BEATS / N-HiTS

Step 3: Evaluate results

Conclusion

More articles by this author

Others also viewed

Deepchecks for Data and Model Validation

The Three Most Common Statistical Tests You Should Deeply Understand

All Data and AI Weekly 179 - 03-March-2025

3D Plot using Plotly (with Examples)

Mastering pandas for Large Datasets: Strategies for Efficient Processing

All Data and AI Weekly #195 - June 23, 2025

AI_Part_5_K-NN

The T-test!

Time-Series-Analysis-with-Statsmodels - Chapter 3

Day 13 — Density-Based Spatial Clustering of Applications with Noise (DBSCAN)

Explore topics

Introduction

Step-by-Step Working Example

Step 1: Read and transform the data

Step 2: Train multiple models

2.1 Simple Exponential Smoothing (SES), Theta and Linear Regression

2.2 Random Forest

2.3 RNN

2.4 TBATS

2.5 N-BEATS / N-HiTS

Step 3: Evaluate results

Conclusion

Time Series Episode 8: Real-world forecasting

May 21, 2025

Linear Regression, by "MLshorts"

Apr 26, 2025

The easiest AI agent you will ever create!

Jan 30, 2025

Time Series Episode 7: “Darts” with covariates

Jan 26, 2025

Time Series Episode 5: Getting Started with “Darts”

Jun 15, 2024

Time Series Episode 4: Can you trust Auto-ARIMA?

Jun 1, 2024

Time Series Episode 3: ARIMA Forecasting with exogenous variables

May 18, 2024

Time Series Episode 2: What happens with strong seasonality

May 5, 2024

Time Series Episode 1: How to select the correct SARIMA parameters

Apr 27, 2024

Time Series Episode 0: Familiarize with ARIMA and its parameters

Apr 21, 2024

Others also viewed

Deepchecks for Data and Model Validation

The Three Most Common Statistical Tests You Should Deeply Understand

All Data and AI Weekly 179 - 03-March-2025

3D Plot using Plotly (with Examples)

Mastering pandas for Large Datasets: Strategies for Efficient Processing

All Data and AI Weekly #195 - June 23, 2025

AI_Part_5_K-NN

The T-test!

Time-Series-Analysis-with-Statsmodels - Chapter 3

Day 13 — Density-Based Spatial Clustering of Applications with Noise (DBSCAN)

Explore topics