Time Series Episode 5: Getting Started with “Darts”

Vasilis Kalyvas

Senior Data Scientist at Coca-Cola HBC | AI/ML tutorials

Published Jun 15, 2024

Introduction

Hi there! Happy to see you again in this series of articles, where we discuss about Time Series, theory and examples.

In the previous articles we discussed about ARIMA-family models for forecasting and working examples of how to apply them, based on my experience from multiple projects so far.

Now let’s work on something different than ARIMA. I recently started working with the “Darts” library in Python which contains lots of forecasting algorithms, making it very easy to do feature engineering and apply and compare easily whatever you want, as well as evaluating results in the end.

According to their page:

“Darts is a Python library for user-friendly forecasting …. using and functions, similar to scikit-learn. The library also makes it easy to backtest models, combine the predictions of several models, and take external data into account. Darts supports both univariate and multivariate time series and models. The ML-based models can be trained on potentially large datasets containing multiple time series, and some of the models offer a rich support for probabilistic forecasting.”

We will work with the same datasets as in the previous articles, because they are easy to understand and implement the Darts’ algorithms (but there will be more real-world datasets in future articles 🙂).

You ready? Let’s start!

Step-by-Step Working Example

We begin with the dataset described in my first hands-on tutorial:

Time Series Episode 1: How to select the correct SARIMA parameters

After investigation of parameters and multiple iterations, we had arrived at this model:

SARIMA(2,1,1)(1,0,1,12)

Before you start complaining “again with this airline-passengers dataset?? enough is enough!!”, let me defend myself by highlighting this dataset’s simplicity and ease-of-use, that make it helpful for you to understand the basic concepts around the algorithms that we are going to discuss about. As I promised, more difficult datasets will be featured in this Series’ future Episodes, so subscribe, follow and stay alerted!

For now let’s focus on this one and start experimenting:

Step 1: Read and transform the data

We begin by reading the data as “csv”, transforming it into a special “Timeseries” object required by Darts, plotting the series and checking for seasonality:

Darts provides lots and lots of useful of capabilities for data engineering and statistical tests, along with all the forecasting models.

The “check_seasonality” test tells us that the time series has seasonality of 12 months, which is also very clear from Image 1. We see the same pattern repeating itself every 12 months (data points), with an increasing trend.

To begin the testing, let’s keep everything until 1960 as training set and the last year as the evaluation set:

So we will have 120 of months for training and will try to predict the last 24 months (2 years).

Step 2: Train multiple models

2.1 Simple Exponential Smoothing, AutoARIMA, Theta

Let’s start with SES, that considers most recent past data:

Then we train an AutoARIMA model, where we provide the seasonality of 12 months and let the model decide on itself the values of p,d,q and P,D,Q:

And we also train a Theta model, which is similar to SES but applying the two theta lines to consider general seasonality and recent trend:

Now let’s plot the results and evaluate with MAPE (Mean Absolute Percentage Error):

Image 2 — Results of SES, AutoARIMA, Theta

The results vary, and SES is best among them with a 7.62% MAPE. If you look closely, you will also see the confidence interval range of its predictions.

All models capture the overall pattern, but AutoARIMA and Theta don’t seem to catch the upward trend very well, especially even further in the future.

2.2 Linear Regression

Thankfully, Darts makes it easy to even test algorithms on time series that are most appropriate for tabular data. One of them is the Linear Regression, that can consider whatever lags we want.

Image 3 — Results of Linear Regression where “output_chunk_length” = 1

Results look good, and better than previously, but please pay attention to one key parameter: “output_chunk_length”.

This can be a very important parameter, which according to documentation:

“output_chunk_length”: Number of time steps predicted at once (per chunk) by the internal model. It is not the same as forecast horizon used in , which is the desired number of prediction points generated using a one-shot- or autoregressive forecast.

Basically, this means how many months are being predicted with every model training. So, in the above code snippet, in the training we use 12 lags (i.e. 12 months in the past) to predict the next month (output_chunk_length = 1).

As such, to predict the 12 months of the evaluation set, we utilize the last 12 months of the train dataset and predict the 1st month of the evaluation set. Then the next 12 (along with this 1st month of prediction) to predict the 2nd month of the evaluation set etc.

But what happens if we now utilize 12 months to predict the next 12 months altogether?

Image 4 — Results of Linear Regression where “output_chunk_length” = 12

2.3 Random Forest

Now let’s see how a tree-based algorithm can be utilized for Time Series data.

Random Forest is an ensemble model that is known for its high accuracy and robustness. It is more suited for tabular data, however Darts gives the ability to use it for Time Series by considering the lagged values of the series as features for training the model. That way, we can select some lags to represent the “X features” for training but we should keep in mind that there will be no sequence in them.

For example, you might think “ok let’s select 12 lags as features for the Random Forest and try to predict the next month.” Yeah ok, but there are two problems:

The model will select only a random subset of the features (because that’s how it works, unless change it), and also filter out the correlated ones.
It behaves like tabular features, meaning that even if it considers e.g. lags 1,2,3,4 it doesn't understand that lag 1 is more recent than lag 2, which is more recent than lag 3 etc. So you eliminate the sequential nature of the Time Series.

That’s why this model needs a bit of different thinking.

Let’s say we care about seasonality and trend. So one easy solution is to select only lags 1 & 2 (i.e. last 2 months), to consider recent trend, and lag 12 (i.e. 12 months ago, so same month last year) to consider seasonality, and try every time to predict the next month.

Let’s see:

Not as good as with previous models. It captures the seasonality but not the increasing trend. We can experiment with different combinations of lags, but still doesn’t seem promising.

2.4 RNN

Neural Networks are another model family provided by Darts, in an effort to make them easily available for Time Series forecasting.

RNNs (Recurrent Neural Networks) are a type of Neural Networks that uses sequential Time Series data and have a recurrent connection that allows them to maintain a memory of past inputs. This allows RNNs to learn patterns and dependencies in sequential data and make predictions based on the context of the input sequence.

However, RNNs can suffer from Vanishing Gradient Problem (where gradients of the loss function become very small and hard to learn) which affects the initial neurons from learning, as well as Exploding Gradient Problem (when the gradients of the loss function become very large and cause the weights of the network to become unstable). Also, when given large sequences of training data, RNNs can struggle to capture long-term dependencies.

That’s where LSTM (Long Short-Term Memory) models get into play, as they are significant for time series forecasting due to their ability to capture long-term dependencies and handle sequential data more effectively.

You can find multiple online sources to learn more about them, so let’s focus here on how to use them with Darts (and also, let’s consider that you know some basics about implementing Neural Networks in Python).

Darts gives us the ability to bring a LSTM model and configure its hyperparameters. Apart from the usual ones (number of epochs, number of hidden states, learning rate, dropout etc), two important parameters are:

training_length: number of data points (months) to be used for training. It is equal to the length of both input and output data points. Here we set it to 24, so two years in every training
input_chunk_length: the number of past data points (months) to be used for predictions. The RNN will look back 12 months in the past from the moment of prediction, so it will take the full seasonal cycle of the year 1959, to compute predictions for 1960. If we increase the value, then we force the RNN to rely more on its long-term memory. However, this parameter should not exceed the previous one (training_length).

Let’s what we get:

Image 6 — Results of RNN (trained on original data)

Terrible results! 😄

But don’t worry, we know why!

Neural Networks work best with data scaled in the range 0–1 because of their activation functions like the sigmoid and hyperbolic tangent have a range between 0 and 1. If the input features are not scaled, the gradients during backpropagation can become very large or very small, leading to unstable training and poor performance.

Let’s scale the data (as provided by Darts) and try again:

Image 7 — Results of RNN (trained on scaled data)

Great improvement!

Of course it is clearly seen that if we train on scaled data (and then bring the predictions back to original scale), the results are far better than before.

However, still not as good as other algorithms, and that’s something not to blame the model for.

We see that the first year was better forecasted than the second, which makes us think that model is working better for near future and that’s ok.

Now, what could happen of we give the model more information?

We can observe that, yes we have a yearly seasonality. Every same month, but different year, the pattern is repeated. Can we provide this information to the model?

Yes, again Darts gives us this ability to create new additional datetime attribute TimeSeries instance, with frequency for the years and binary encoding for the months.

That way, in every month’s prediction we give information to the model about the name of year and month to be predicted. These additional series are named as “covariates” (or “exogenous variables”, as we saw in previous articles).

Let’s see now:

Image 8 — Results of RNN (trained on scaled data with covariates)

Yes, now that’s an improvement!

Step 3: Evaluate results

Now we are able to compare all models and evaluate their performance on the same dataset:

The Linear Regression, where every time we use 12 months to predict the next 12 months gives the best results among all models.

That’s a reminder that sometimes a simpler model can outperform some more fancy ones 😉

Conclusion

In this article we experimented with multiple models from the Darts library in Python, trying to forecast a simple Time Series dataset

Some things to keep in mind after all the experiments:

Darts provides an easy-to-use framework to transform your Time Series and apply multiple models, ranging from statistical ones up to neural networks.
Hyperparameters like training_length, input/output_chunk_length can heavily affect the training and prediction phases, because they define how many data points are considered to predict some other data points. So, please don’t neglect them.
Covariates can potentially help the model, so make use of them when you can.
A simpler model can end up as the best solution.
And, of course, re-iterate and experiment with more models provided by Darts!

You can also read my original article published on Medium by In Plain English :

Time Series Episode 5: Getting Started with “Darts”

Thanks for reading!

Time Series Episode 5: Getting Started with “Darts”

Vasilis Kalyvas

Senior Data Scientist at Coca-Cola HBC | AI/ML tutorials

Introduction

Step-by-Step Working Example

Step 1: Read and transform the data

Step 2: Train multiple models