-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Add rolling window to sklearn.model_selection.TimeSeriesSplit #22523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Currently there is rolling window support, where the train set does not grow: from sklearn.model_selection import TimeSeriesSplit
x = np.arange(15)
cv = TimeSeriesSplit(max_train_size=3, test_size=1)
for train_index, test_index in cv.split(x):
print("TRAIN:", train_index, "TEST:", test_index)
# TRAIN: [7 8 9] TEST: [10]
# TRAIN: [ 8 9 10] TEST: [11]
# TRAIN: [ 9 10 11] TEST: [12]
# TRAIN: [10 11 12] TEST: [13]
# TRAIN: [11 12 13] TEST: [14] If we want all the windows, we would need to adjust from sklearn.model_selection import TimeSeriesSplit
x = np.arange(15)
cv = TimeSeriesSplit(n_splits=12, max_train_size=3 ,test_size=1)
for train_index, test_index in cv.split(x):
print("TRAIN:", train_index, "TEST:", test_index)
# TRAIN: [0 1 2] TEST: [3]
# TRAIN: [1 2 3] TEST: [4]
# TRAIN: [2 3 4] TEST: [5]
# TRAIN: [3 4 5] TEST: [6]
# TRAIN: [4 5 6] TEST: [7]
# TRAIN: [5 6 7] TEST: [8]
# TRAIN: [6 7 8] TEST: [9]
# TRAIN: [7 8 9] TEST: [10]
# TRAIN: [ 8 9 10] TEST: [11]
# TRAIN: [ 9 10 11] TEST: [12]
# TRAIN: [10 11 12] TEST: [13]
# TRAIN: [11 12 13] TEST: [14] Is the proposal to have |
Yes. Ideally, the `n_splits` parameter could be done away with?
Therefore, the implementation reduces to how many observations are used in
your rolling training window i.e. assuming daily observations, for a given
financial time series, you could train a model on a rolling 252 day
training window, and validate on a 63 day window, walking forward by the
validation size.
…On Thu, Apr 14, 2022 at 2:24 PM Thomas J. Fan ***@***.***> wrote:
Currently there is rolling window support, where the train set does not
grow:
from sklearn.model_selection import TimeSeriesSplit
x = np.arange(15)cv = TimeSeriesSplit(max_train_size=3, test_size=1)for train_index, test_index in cv.split(x):
print("TRAIN:", train_index, "TEST:", test_index)
# TRAIN: [7 8 9] TEST: [10]# TRAIN: [ 8 9 10] TEST: [11]# TRAIN: [ 9 10 11] TEST: [12]# TRAIN: [10 11 12] TEST: [13]# TRAIN: [11 12 13] TEST: [14]
If we want all the windows, we would need to adjust n_splits explicitly:
from sklearn.model_selection import TimeSeriesSplit
x = np.arange(15)cv = TimeSeriesSplit(n_splits=12, max_train_size=3 ,test_size=1)for train_index, test_index in cv.split(x):
print("TRAIN:", train_index, "TEST:", test_index)
# TRAIN: [0 1 2] TEST: [3]# TRAIN: [1 2 3] TEST: [4]# TRAIN: [2 3 4] TEST: [5]# TRAIN: [3 4 5] TEST: [6]# TRAIN: [4 5 6] TEST: [7]# TRAIN: [5 6 7] TEST: [8]# TRAIN: [6 7 8] TEST: [9]# TRAIN: [7 8 9] TEST: [10]# TRAIN: [ 8 9 10] TEST: [11]# TRAIN: [ 9 10 11] TEST: [12]# TRAIN: [10 11 12] TEST: [13]# TRAIN: [11 12 13] TEST: [14]
Is the proposal to have n_splits='walk_fw' provide all the windows
automatically?
—
Reply to this email directly, view it on GitHub
<#22523 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AG4N7QGCNNBIXSZYM2AM6YLVFAMCHANCNFSM5OVPUV5Q>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
So would you accept a PR where the Something like this?
|
We can not remove |
So would setting I ask because as you mentioned, rolling window support is already included, just not for all windows.
|
For example, if x = np.arange(15)
cv = TimeSeriesSplit(n_splits=4, max_train_size=10,test_size=1)
for train_index, test_index in cv.split(x):
print("TRAIN:", train_index, "TEST:", test_index)
|
I have created a class for this in the past for my own usage in personal projects where the calculations for max_train_size and test_size are done automatically given the number of splits desired and the proportion of the window allocated to validation. The calculation works out that the window size is equal to the number of samples times the reciprocal of For example: given a validation proportion of Another example using the values of previous discussions: if we set In my use case I also needed support for longitudinal data, thus the class allows for a time column to be used for window definition as well. An example of the class applied to multiple stocks is shown below. The code for the class and PR for scikit-learn inclusion are both given at #24589. If inclusion is decided against you may copy the class into a python file, remove the _BaseKFold super, and add an n_splits getter. The only 3 requirements when used as a standalone module are as follows:
|
To add the
|
Hi, what is the status of this? This feature would be very useful. @msat59's code seems to work for me. |
Any updates? |
Describe the workflow you want to enable
I wanted to ask whether any plans exist to implement a rolling/sliding window method in the TimeSeriesSplit class:
Currently, we are limited to using the expanding window type. For many financial time series models where a feature experiences a structural break, having a model whose weights are trained on the entire history can prove suboptimal.
I noted in #13204, specifically svenstehle's comments, that this might be on the horizon?
Describe your proposed solution
Current Implementation
Desired outcome
Where the 'stride' of the walk forward is proportionate to the test set, or walks by the max_train_size parameter?
Describe alternatives you've considered, if relevant
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: