[WIP] Add TimeSeriesCV and HomogeneousTimeSeriesCV #6351

yenchenlin · 2016-02-13T05:43:05Z

This PR is an implementation of time series cross validation.
Based on #6322 , there are basically two cases:

Homogeneous time series
Heterogeneous time series

For brevity,
I'll use HomoTSCV to represent homogeneous time series and HeteroTSCV to represent heterogeneous time series in the following checklist.

Check List:

Reference:
Using k-fold cross-validation for time-series model selection

yenchenlin · 2016-02-13T06:24:16Z

sklearn/model_selection/_split.py

@@ -637,6 +637,121 @@ def split(self, X, y, labels=None):
        """
        return super(StratifiedKFold, self).split(X, y, labels)

+class HomogeneousTimeSeriesCV(_BaseKFold):


It is convenient to make HomogeneousTimeSeriesCV subclass _BaseKFold since __init__ in _BaseKFold can help HomogeneousTimeSeriesCV to check whether input parameter n_folds is valid.

However, in order to make HomogeneousTimeSeriesCV work, HomogeneousTimeSeriesCV needs to override function split which is defined in its superclass _BaseKFold and its super-superclass BaseCrossValidator due to their implementation detail of split.

I don't think override split in HomogeneousTimeSeriesCV is a good solution since other subclasses of _BaseKFold didn't override split but override _iter_test_indices and _iter_test_masks instead, can @rvraghav93 provide some suggestions on this?

MechCoder · 2016-03-07T18:57:52Z

Please split the PR into two for now. Will be easier to review.

MechCoder · 2016-03-07T19:19:35Z

sklearn/model_selection/_split.py

+
+    Notes
+    -----
+    The first ``n_samples % n_folds`` folds have size


This note is confusing, to say the least.

You should mention that for the first n_samples % n_folds, the number of samples in each fold are "incremented" by n_samples // n_folds + 1.

Sorry maybe I'm too dumb.

the number of samples in each fold are "incremented" by n_samples // n_folds + 1

Why is "incremented" used here?
I think the number of samples in the first n_samples % n_folds folds is exactly n_samples // n_folds + 1,
which is "incremented" by 1 compared to other folds?

Oh, this is related to the prev comment. Sorry I meant in "each split the number of samples are incremented by"

You can move this to the n_folds documentation to avoid such confusion

MechCoder · 2016-03-07T19:29:19Z

Just gave a first pass. I also agree that this would be highly useful!!

yenchenlin · 2016-03-07T23:43:11Z

@MechCoder Thanks!
I will wait for your reply and then separate this.

yenchenlin · 2016-03-24T05:25:45Z

Close this PR since I plan to separate this into two PRs:

Homogeneous time series
Heterogeneous time series

yenchenlin force-pushed the add-homogeneous-time-series-cv branch 2 times, most recently from 75e7401 to 04b9e79 Compare February 13, 2016 06:18

yenchenlin reviewed Feb 13, 2016
View reviewed changes

yenchenlin force-pushed the add-homogeneous-time-series-cv branch 3 times, most recently from 73f5661 to 62bafb4 Compare February 14, 2016 12:26

yenchenlin added 2 commits February 16, 2016 16:01

Add homogeneous-time-series-cv

1d5ed6d

Add test for HTSCV

78c3dcd

yenchenlin force-pushed the add-homogeneous-time-series-cv branch from 62bafb4 to 78c3dcd Compare February 16, 2016 08:01

yenchenlin mentioned this pull request Mar 2, 2016

Add TimeSeriesCV and HomogeneousTimeSeriesCV #6322

Open

MechCoder reviewed Mar 7, 2016
View reviewed changes

MechCoder mentioned this pull request Mar 13, 2016

IPython notebook for 10 fold cross-validation with ERT's? mjbommar/scotus-predict#4

Closed

yenchenlin closed this Mar 24, 2016

yenchenlin mentioned this pull request Mar 24, 2016

[MRG+2?] Add homogeneous time series cross validation #6586

Merged

sluofoss mentioned this pull request Sep 26, 2024

FEA Group aware Time-based cross validation #16236

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP] Add TimeSeriesCV and HomogeneousTimeSeriesCV #6351

[WIP] Add TimeSeriesCV and HomogeneousTimeSeriesCV #6351

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[WIP] Add TimeSeriesCV and HomogeneousTimeSeriesCV #6351

[WIP] Add TimeSeriesCV and HomogeneousTimeSeriesCV #6351

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!