overwrite min_train_series_length in gradient_boosted_model and catboost_model

**Is your feature request related to a current problem? Please describe.**
Currently the min_train_series_length of regression models is defined as:
max(3, -self.lags["target"][0] + self.output_chunk_length)

However, we believe the minimum required length of the series might not be the same for each model inheriting from RegressionModel. Particularly, I believe that for LightGBM and CatBoost the min_train_series_length should be
max(3, -self.lags["target"][0] + self.output_chunk_length + 1)
The reason for this is that without the +1 the min_train_series_length would result in the creation of a single training sample whereas LightGBM and CatBoost actually require at least two samples when calling fit(). Below is an illustration of how this would work for LightGBM

In below example:
date quantity
2021-01 10
2021-02 8
2021-03 14
2021-04 7
2021-05 6
2021-06 5
With self.lags["target"][0] = -4 and self.output_chunk.length = 2, this will result in a min_train_series_length of 6

df_X_y, created in the function _create_lagged_data, results in only 1 sample as all rows with nan values are removed.

Sklearn's check
_LGBMCheckXY(X, y, accept_sparse=True, force_all_finite=False, ensure_min_samples=2) will for this example throw an error.

For random forest the required samples is the default 1, but for catboost it is also 2 (although there is no explicit check performed, trying to fit CatBoost on a single training sample will fail as (I believe) it cannot deal with situation in which all features are constant (which is the case if you only pass a single training sample)).

**Describe proposed solution**
overwrite min_train_series_length in gradient_boosted_model and catboost_model:
max(3, -self.lags["target"][0] + self.output_chunk_length + 1)

**Describe potential alternatives**


**Additional context**



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

overwrite min_train_series_length in gradient_boosted_model and catboost_model #1214

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

overwrite min_train_series_length in gradient_boosted_model and catboost_model #1214

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions