Description
I get this asked about once a day, so I think we should just add it.
Many people work with time series, and adding cross-validation for them would be really easy.
The standard strategy is described for example here
There are basically two cases: homogeneous time series (one sample every X seconds / days), or heterogeneous time series, where each sample has a time stamp.
For the homogeneous case, we can just put the first n_samples // n_folds
in the first fold etc, so it's a very simple variation of KFold. Fixed in #6586.
For heterogeneous case, we need to get a labels
array and split accordingly. If we cast that to integers, people could actually provide pandas time series, and they would be handled correctly (they will be converted to nanoseconds).
I remember arguing against this addition, but I changed my mind ;)