8000 Alternative to TimeSeriesSplit with better parametrisation by jayzed82 · Pull Request #18674 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Alternative to TimeSeriesSplit with better parametrisation #18674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

jayzed82
Copy link
Contributor

This is an alternative to TimeSeriesSplit to perform causal (time)series cross-validation with rolling train/test windows using better parametrisation.

It takes as arguments: train window size, test window size, gap (aka "embargo") size, step size. Optionally train window can be set as expanding.

The number of folds will depend on the data length and split parameters.

data = np.arange(20)

rsplit = RollingSplit(train_size=7, test_size=4, gap=2, step=3, expanding=False)
for train, test in rsplit.split(data):
    print(train, ' ', test)

>> [0 1 2 3 4 5 6]   [ 9 10 11 12]
>> [3 4 5 6 7 8 9]   [12 13 14 15]
>> [ 6  7  8  9 10 11 12]   [15 16 17 18]

rsplit = RollingSplit(train_size=4, test_size=4, gap=0, step=2, expanding=True)
for train, test in rsplit.split(data):
    print(train, ' ', test)

>> [0 1 2 3]   [4 5 6 7]
>> [0 1 2 3 4 5]   [6 7 8 9]
>> [0 1 2 3 4 5 6 7]   [ 8  9 10 11]
>> [0 1 2 3 4 5 6 7 8 9]   [10 11 12 13]
>> [ 0  1  2  3  4  5  6  7  8  9 10 11]   [12 13 14 15]
>> [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13]   [14 15 16 17]
>> [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]   [16 17 18 19]

…riesSplit which allows better control of sliding windows behaviour.
…riesSplit which allows better control of sliding windows behaviour.
Base automatically changed from master to main January 22, 2021 10:53
@thomasjpfan
Copy link
Member

Thank you for the PR. It's often better t 80AF o open an issue before creating a PR to discuss if the feature is fit for including into scikit-learn.

In this case, this feature is closely related to #22523. For me, I am -1 on adding another Splitter that is a reparametrization of another splitter (TimeSeriesSplit). It makes it harder for users to figure out "which splitter" to use. With that in mind I am closing this PR. You are welcome to continue the discussion in #22523.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0