Unclear behavior of max_train_size argument in TimeSeriesSplit

Description

So I am trying to understand the behavior of TimeSeriesSplit. Especially the max_train_size parameter. I was initially surprised that it is an absolute number and not a ratio like it is in other splitting operations.

I traced this parameter to issue #8249 and PR #8282 and I realized that it was added to support window-based splitting, as it is described here. This was very surprising for me because this is not really clear from documentation that this is happening. Moreover, I found parameters initialWindow, horizon, and fixedWindow much easier to understand, especially with that image.

I would suggest that:

Documentation is improved here. Such visualization as shown in https://topepo.github.io/caret/data-splitting.html#data-splitting-for-time-series would really help a lot.
We consider using or sample based parameters, like initialWindow, horizon, and fixedWindow, or ratio/fold based ones, but not both, because it is very confusing.

If we have splitting done by number of folds (which I prefer because it makes things adapt to different dataset sizes automatically), then also window size should be expressed in folds. In a way, parameters could then be:

How many folds to do.
Number of folds used in horizon, i.e., used in test data. It looks like this is currently fixed to 1 in this splitting operation and cannot really be configured. I suggest we allow this to be configured.
Number of folds used in the window, i.e., training data. Default could be None, which would mean a non-fixed window and would mean to use all folds before the test data. Or you could fix it to get a sliding window.

Versions

Relates to how it is in sklearn v0.20.3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Description

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Description

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions