Combinatorial Purged Cross-Validation strategy #22229

r-matsuzaka · 2022-01-17T01:16:29Z

Describe the workflow you want to enable

Hi.
Is it worth adding CPCV(Combinatorial Purged Cross-Validation) in the list of model_selection members?

About CPCV:
https://stats.stackexchange.com/questions/443159/what-is-combinatorial-purged-cross-validation-for-time-series-data

Describe your proposed solution

If it is welcomed, I am happy to make a pull request.

Describe alternatives you've considered, if relevant

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

Micky774 · 2022-01-17T15:35:05Z

Just to add context, it looks like the author of this post has an implementation that is already close to scikit-learn API. This may make implementation, if deemed worthwhile, simpler.

r-matsuzaka · 2022-01-18T11:03:34Z

But it is no more maintained.
I contacted author.
And most of test codes do not work.
So I need to refactor it if I utilise it.
It seems that the feature does not match concept of scikit learn from reactuon, so I will close it.

glemaitre · 2022-01-27T09:30:17Z

It is weird to me to have a time series split that uses future data.

Also, I think that scikit-learn should be limited to TimeSeriesSplit since we are not currently tackling properly forecasting or providing a preprocessor to make time-series classification or regression.

r-matsuzaka · 2022-01-27T13:36:28Z

Thank you very much for commenting.

It is weird to me to have a time series split that uses future data.

For financial problems, return or loss is defined with the values at different time points.
That may cause data leakage.
So purging is needed to avoid over-fitting.

Also, I think that scikit-learn should be limited to TimeSeriesSplit since we are not currently tackling properly forecasting or providing a preprocessor to make time-series classification or regression.

Ok. If this implementation is beneficial for limited people, I will close it.

glemaitre · 2022-01-27T13:38:13Z

Ok. If this implementation is beneficial for limited people, I will close it.

However, I think that we have a couple of PR or issues related to TimeSeriesSplit that could at least be beneficial if you want to have a look.

r-matsuzaka · 2022-01-27T13:52:42Z

However, I think that we have a couple of PR or issues related to TimeSeriesSplit that could at least be beneficial if you want to have a look.

Could you tell me?
I can help to solve issue.

glemaitre · 2022-01-29T12:45:23Z

One such example: #14257

r-matsuzaka · 2022-01-29T13:47:55Z

Thanks

AhmedThahir · 2025-03-16T06:42:57Z

For financial problems, return or loss is defined with the values at different time points. That may cause data leakage. So purging is needed to avoid over-fitting.

combinatoral part makes sense
purging makes sense
embargo makes sense
But, as @glemaitre pointed out, it seems odd that future data is used in training set

r-matsuzaka added Needs Triage Issue requires triage New Feature labels Jan 17, 2022

r-matsuzaka closed this as completed Jan 18, 2022

r-matsuzaka mentioned this issue Jan 23, 2022

FEA implementation of Combinatorial Purged Cross-Validation strategy #22273

Closed

r-matsuzaka reopened this Jan 25, 2022

glemaitre changed the title ~~About CPCV~~ Combinatorial Purged Cross-Validation strategy Jan 27, 2022

glemaitre closed this as completed Jan 29, 2022

MichaelKarpe mentioned this issue Oct 7, 2022

ENH added RollingWindowCV to sklearn.model_selection #24589

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combinatorial Purged Cross-Validation strategy #22229

Combinatorial Purged Cross-Validation strategy #22229

Combinatorial Purged Cross-Validation strategy #22229

Combinatorial Purged Cross-Validation strategy #22229

Comments

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context