[MRG] Feature: Additional `TimeSeriesSplit` Functionality #13204

kykosic · 2019-02-20T23:39:53Z

jnothman · 2019-02-21T02:16:52Z

kykosic · 2019-02-21T04:06:30Z

mitar · 2019-04-18T05:29:36Z

kykosic · 2019-04-19T14:31:37Z

jnothman

jnothman · 2019-05-30T02:17:39Z

sklearn/model_selection/_split.py

@@ -799,21 +832,32 @@ def split(self, X, y=None, groups=None):
        n_samples = _num_samples(X)
        n_splits = self.n_splits
        n_folds = n_splits + 1
+        gap_size = self.gap_size
+        test_size = self.test_size if self.test_size else n_samples // n_folds


perhaps this should be if self.test_size is not None so that 0 and None are not treated the same.

perhaps the else should be (n_samples - gap) // n_folds so that train_size remains > gap

I've updated the self.test_size is not None. I'm not convinced that the train_size necessarily needs to be constrained to be greater than the gap, should that be a decision left to the user?

Edit: I might have misunderstood but - is max_train_size supposed to implement a sliding window? If that's the case please clarify and let's share opinions . I think we can keep it simpler with just train_size and test_size and more string options for n_splits.

My opinion is I like the idea of having sensible defaults and as much control for the user as possible:

Add option n_splits='walk_fw' ; should walk forward with specified min_train_size, test_size --> keep the functionality of n_splits as it is. Idea: also implement a walk forward as sliding window is possible. See more on that below.

What I would love to see added from caret as well is the idea of a sliding window. Please consider including this as well. Caret calls it fixedWindow though. The idea is to include only examples inside a fixed time window and then walk forward and move the window along --> could be implemented along the lines of if n_splits=='window' and min_train_size>1 and test_size>1. Here, test_size is the prediction horizon of the window, min_train_size is the train_size for the window each fold that gets moved up by 1 each time.

Allow user to specify min_train_size smaller than gap, if they so desire (it's a parameter to be explored imo). I agree with you that a reasonable default is min_train_size >= gap if gap > 0

Allow user to specify a min_train_size both smaller and larger than test_size but keep default >= 'test_size'

max_train_size='fixed' --> I'm missing something, what is the use of a max_train_size when doing walk forward while growing the train_set with the previous test_set. Should this implement sliding window?

What is the idea behind test set overlap from caret for walk forward in our case? We would have overlap if we walk forward for 1 time unit and our test_size spans more than 1 time unit. That is the idea? - And I think it's methodically OK. I may be missing something though. On that note also consider the sliding window, we have overlap here for certain if we have a horizon or test_size>1:

sklearn/model_selection/_split.py

kykosic · 2019-12-26T15:54:03Z

jnothman · 2020-01-05T20:45:19Z

jnothman

sklearn/model_selection/_split.py

sklearn/model_selection/tests/test_split.py

kykosic · 2020-01-06T17:23:30Z

pepicello · 2020-02-18T10:12:55Z

kykosic · 2020-02-18T14:19:55Z

tuomijal · 2020-05-04T13:29:04Z

jnothman

sklearn/model_selection/_split.py

doc/modules/cross_validation.rst

sklearn/model_selection/_split.py

sklearn/model_selection/tests/test_split.py

thomasjpfan · 2020-05-10T15:55:13Z

sklearn/model_selection/tests/test_split.py

+        next(splits)
+
+
+def test_time_series_gap():


This test_time_series_test_size and test_time_series_gap can be pytest.mark.parametrize.

The two error cases can be in its own test as well. This is not a blocker and can be done in a followup PR.

I would be happy to implement this suggestion in a followup PR

thomasjpfan

sklearn/model_selection/_split.py

thomasjpfan · 2020-05-12T01:48:16Z

mjbommar · 2020-05-12T02:00:57Z

Kyle Kosic added 9 commits February 20, 2019 13:33

initial changes

72bd6bc

docs and unit tests

5410798

typo in docstring

f04c37d

flake8; clarify indexing logic

3fed48f

flake8 tests

266751f

another flake8 fix in test_split.py

1e06e43

fix failing doctest

166b9a9

flake8 again :)

dbb51b4

update cv rst documentation

a0c16f6

kykosic changed the title ~~[WIP] Feature: Additional TimeSeriesSplit Functionality~~ [MRG] Feature: Additional TimeSeriesSplit Functionality Feb 21, 2019

resolve merge conflict with master

04b2909

mitar mentioned this pull request Apr 18, 2019

Unclear behavior of max_train_size argument in TimeSeriesSplit #13666

Open

WenjieZ mentioned this pull request May 1, 2019

[MRG] Cross-validation for time series (inserting gaps between the training set and the test set) #13761

Closed

jnothman reviewed May 30, 2019

View reviewed changes

amueller mentioned this pull request Jul 23, 2019

Implement WalkForward cross-validator for time series data. #14376

Closed

kykosic added 5 commits December 25, 2019 12:03

resolve merge conflict

95a4e63

rename gap_size -> gap

b4fa003

change check for is not None

caa4398

update docs

a17d06a

update doctests

9fdf59a

jnothman reviewed Jan 5, 2020

View reviewed changes

sklearn/model_selection/_split.py Outdated Show resolved Hide resolved

sklearn/model_selection/tests/test_split.py Outdated Show resolved Hide resolved

kykosic added 3 commits January 6, 2020 09:52

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

f74b5ae

…into improve_tssplit

clean up docs and tests

bb52c26

fix kwarg

a765033

update test

7461e84

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

b9e1fe5

…into improve_tssplit

github-actions bot added the module:model_selection label Mar 2, 2020

kykosic added 2 commits May 4, 2020 13:17

Merge branch 'master' into improve_tssplit

5d291eb

resolve merge conflict

dc58df9

kykosic requested a review from jnothman May 8, 2020 12:31

jnothman approved these changes May 10, 2020

View reviewed changes

sklearn/model_selection/_split.py Show resolved Hide resolved

jnothman added the Waiting for Reviewer label May 10, 2020

jnothman added this to the 0.24 milestone May 10, 2020

thomasjpfan reviewed May 10, 2020

View reviewed changes

kykosic added 2 commits May 10, 2020 14:48

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

e1be8cf

…into improve_tssplit

implement suggested changes

02d17bc

thomasjpfan approved these changes May 10, 2020

View reviewed changes

sklearn/model_selection/_split.py Outdated Show resolved Hide resolved

sklearn/model_selection/_split.py Outdated Show resolved Hide resolved

sklearn/model_selection/_split.py Outdated Show resolved Hide resolved

sklearn/model_selection/_split.py Outdated Show resolved Hide resolved

kykosic and others added 2 commits May 10, 2020 21:41

Update sklearn/model_selection/_split.py

0f21d45

Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>

Apply suggestions from code review

d6797e3

Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>

thomasjpfan merged commit b4e215c into scikit-learn:master May 12, 2020

gio8tisu pushed a commit to gio8tisu/scikit-learn that referenced this pull request May 15, 2020

FEA Additional TimeSeriesSplit Functionality (scikit-learn#13204)

704edcb

Co-authored-by: Kyle Kosic <kylekosic@Kyles-MacBook-Pro.local> Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>

viclafargue pushed a commit to viclafargue/scikit-learn that referenced this pull request Jun 26, 2020

FEA Additional TimeSeriesSplit Functionality (scikit-learn#13204)

4f9d14a

Co-authored-by: Kyle Kosic <kylekosic@Kyles-MacBook-Pro.local> Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>

jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020

FEA Additional TimeSeriesSplit Functionality (scikit-learn#13204)

65c1f62

Co-authored-by: Kyle Kosic <kylekosic@Kyles-MacBook-Pro.local> Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>

cmp1 mentioned this pull request Feb 17, 2022

Add rolling window to sklearn.model_selection.TimeSeriesSplit #22523

Open

Uh oh!

[MRG] Feature: Additional TimeSeriesSplit Functionality #13204

[MRG] Feature: Additional TimeSeriesSplit Functionality #13204

Uh oh!

Conversation

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

1. test_size : int, optional

2. gap : int, default=0

Any other comments?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[MRG] Feature: Additional `TimeSeriesSplit` Functionality #13204

[MRG] Feature: Additional `TimeSeriesSplit` Functionality #13204

1. `test_size` : int, optional

2. `gap` : int, default=0