Alternative to TimeSeriesSplit with better parametrisation #18674

jayzed82 · 2020-10-22T13:30:52Z

This is an alternative to TimeSeriesSplit to perform causal (time)series cross-validation with rolling train/test windows using better parametrisation.

It takes as arguments: train window size, test window size, gap (aka "embargo") size, step size. Optionally train window can be set as expanding.

The number of folds will depend on the data length and split parameters.

data = np.arange(20)

rsplit = RollingSplit(train_size=7, test_size=4, gap=2, step=3, expanding=False)
for train, test in rsplit.split(data):
    print(train, ' ', test)

>> [0 1 2 3 4 5 6]   [ 9 10 11 12]
>> [3 4 5 6 7 8 9]   [12 13 14 15]
>> [ 6  7  8  9 10 11 12]   [15 16 17 18]

rsplit = RollingSplit(train_size=4, test_size=4, gap=0, step=2, expanding=True)
for train, test in rsplit.split(data):
    print(train, ' ', test)

>> [0 1 2 3]   [4 5 6 7]
>> [0 1 2 3 4 5]   [6 7 8 9]
>> [0 1 2 3 4 5 6 7]   [ 8  9 10 11]
>> [0 1 2 3 4 5 6 7 8 9]   [10 11 12 13]
>> [ 0  1  2  3  4  5  6  7  8  9 10 11]   [12 13 14 15]
>> [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13]   [14 15 16 17]
>> [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]   [16 17 18 19]

…riesSplit which allows better control of sliding windows behaviour.

Merge to upstream

thomasjpfan · 2022-04-14T19:26:13Z

Thank you for the PR. It's often better t 80AF o open an issue before creating a PR to discuss if the feature is fit for including into scikit-learn.

In this case, this feature is closely related to #22523. For me, I am -1 on adding another Splitter that is a reparametrization of another splitter (TimeSeriesSplit). It makes it harder for users to figure out "which splitter" to use. With that in mind I am closing this PR. You are welcome to continue the discussion in #22523.

jayzed82 added 2 commits October 22, 2020 15:05

Implements new splitter: RollingSplit. It is an alternative to TimeSe…

c7c5f39

…riesSplit which allows better control of sliding windows behaviour.

Implements new splitter: RollingSplit. It is an alternative to TimeSe…

eb5e1c6

…riesSplit which allows better control of sliding windows behaviour.

github-actions bot added the module:model_selection label Oct 22, 2020

RollingSplit: fix linting

e862d22

Base automatically changed from master to main January 22, 2021 10:53

jayzed82 and others added 3 commits May 10, 2021 14:47

Merge remote-tracking branch 'upstream/master'

104000d

Merge to upstream

Merge branch 'scikit-learn:main' into master

0ec6afd

Merge branch 'scikit-learn:main' into master

f52c0d4

thomasjpfan closed this Apr 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative to TimeSeriesSplit with better parametrisation #18674

Alternative to TimeSeriesSplit with better parametrisation #18674

Alternative to TimeSeriesSplit with better parametrisation #18674

Alternative to TimeSeriesSplit with better parametrisation #18674

Conversation