8000 Combinatorial Purged Cross-Validation strategy · Issue #22229 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Combinatorial Purged Cross-Validation strategy #22229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
r-matsuzaka opened this issue Jan 17, 2022 · 9 comments
Closed

Combinatorial Purged Cross-Validation strategy #22229

r-matsuzaka opened this issue Jan 17, 2022 · 9 comments
Labels
Needs Triage Issue requires triage New Feature

Comments

@r-matsuzaka
Copy link

Describe the workflow you want to enable

Hi.
Is it worth adding CPCV(Combinatorial Purged Cross-Validation) in the list of model_selection members?

About CPCV:
https://stats.stackexchange.com/questions/443159/what-is-combinatorial-purged-cross-validation-for-time-series-data

Describe your proposed solution

If it is welcomed, I am happy to make a pull request.

Describe alternatives you've considered, if relevant

No response

Additional context

No response

@r-matsuzaka r-matsuzaka added Needs Triage Issue requires triage New Feature labels Jan 17, 2022
@Micky774
Copy link
Contributor

Just to add context, it looks like the author of this post has an implementation that is already close to scikit-learn API. This may make implementation, if deemed worthwhile, simpler.

@r-matsuzaka
Copy link
Author
r-matsuzaka commented Jan 18, 2022

But it is no more maintained.
I contacted author.
And most of test codes do not work.
So I need to refactor it if I utilise it.
It seems that the feature does not match concept of scikit learn from reactuon, so I will close it.

@r-matsuzaka r-matsuzaka reopened this Jan 25, 2022
@glemaitre glemaitre changed the title About CPCV Combinatorial Purged Cross-Validation strategy Jan 27, 2022
@glemaitre
Copy link
Member

It is weird to me to have a time series split that uses future data.

Also, I think that scikit-learn should be limited to TimeSeriesSplit since we are not currently tackling properly forecasting or providing a preprocessor to make time-series classification or regression.

@r-matsuzaka
Copy link
Author
r-matsuzaka commented Jan 27, 2022

Thank you very much for commenting.

It is weird to me to have a time series split that uses future data.

For financial problems, return or loss is defined with the values at different time points.
That may cause data leakage.
So purging is needed to avoid over-fitting.

Also, I think that scikit-learn should be limited to TimeSeriesSplit since we are not currently tackling properly forecasting or providing a preprocessor to make time-series classification or regression.

Ok. If this implementation is beneficial for limited people, I will close it.

@glemaitre
Copy link
Member

Ok. If this implementation is beneficial for limited people, I will close it.

However, I think that we have a couple of PR or issues related to TimeSeriesSplit that could at least be beneficial if you want to have a look.

@r-matsuzaka
Copy link
Author

However, I think that we have a couple of PR or issues related to TimeSeriesSplit that could at least be beneficial if you want to have a look.

Could you tell me?
I can help to solve issue.

@glemaitre
Copy link
Member

One such example: #14257

@r-matsuzaka
Copy link
Author

Thanks

@AhmedThahir
Copy link

For financial problems, return or loss is defined with the values at different time points. That may cause data leakage. So purging is needed to avoid over-fitting.

  • combinatoral part makes sense
  • purging makes sense
  • embargo makes sense
  • But, as @glemaitre pointed out, it seems odd that future data is used in training set

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Triage Issue requires triage New Feature
Projects
None yet
Development

No branches or pull requests

4 participants
0