8000 Add TimeSeriesCV and HomogeneousTimeSeriesCV · Issue #6322 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Add TimeSeriesCV and HomogeneousTimeSeriesCV #6322
Open
@amueller

Description

@amueller

I get this asked about once a day, so I think we should just add it.
Many people work with time series, and adding cross-validation for them would be really easy.
The standard strategy is described for example here

There are basically two cases: homogeneous time series (one sample every X seconds / days), or heterogeneous time series, where each sample has a time stamp.

For the homogeneous case, we can just put the first n_samples // n_folds in the first fold etc, so it's a very simple variation of KFold. Fixed in #6586.

For heterogeneous case, we need to get a labels array and split accordingly. If we cast that to integers, people could actually provide pandas time series, and they would be handled correctly (they will be converted to nanoseconds).

I remember arguing against this addition, but I changed my mind ;)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0