Skip to content

overwrite min_train_series_length in gradient_boosted_model and catboost_model #1214

@anne-devries

Description

@anne-devries

Is your feature request related to a current problem? Please describe.
Currently the min_train_series_length of regression models is defined as:
max(3, -self.lags["target"][0] + self.output_chunk_length)

However, we believe the minimum required length of the series might not be the same for each model inheriting from RegressionModel. Particularly, I believe that for LightGBM and CatBoost the min_train_series_length should be
max(3, -self.lags["target"][0] + self.output_chunk_length + 1)
The reason for this is that without the +1 the min_train_series_length would result in the creation of a single training sample whereas LightGBM and CatBoost actually require at least two samples when calling fit(). Below is an illustration of how this would work for LightGBM

In below example:
date quantity
2021-01 10
2021-02 8
2021-03 14
2021-04 7
2021-05 6
2021-06 5
With self.lags["target"][0] = -4 and self.output_chunk.length = 2, this will result in a min_train_series_length of 6

df_X_y, created in the function _create_lagged_data, results in only 1 sample as all rows with nan values are removed.

Sklearn's check
_LGBMCheckXY(X, y, accept_sparse=True, force_all_finite=False, ensure_min_samples=2) will for this example throw an error.

For random forest the required samples is the default 1, but for catboost it is also 2 (although there is no explicit check performed, trying to fit CatBoost on a single training sample will fail as (I believe) it cannot deal with situation in which all features are constant (which is the case if you only pass a single training sample)).

Describe proposed solution
overwrite min_train_series_length in gradient_boosted_model and catboost_model:
max(3, -self.lags["target"][0] + self.output_chunk_length + 1)

Describe potential alternatives

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    triageIssue waiting for triaging

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions