-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
HistGradientBoosting avoid data shuffling when early_stopping activated #25460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Indeed, it could even be considered a methodological bug since the obtained model will not be viable. I am not sure what is the best approach here to solve the issue. Adding |
can I work on this one? |
@shamzos as said in my previous message, it is not clear to me what is the way forward. I would wait for the comment of @ogrisel and @jeremiedbb |
The callback API discussed by @glemaitre is being drafted in #22000. This work is paused because it's indeed quite complex to get the API right to allow adding enough flexibility in early stopping data splits while still being intuitive to use, in particular with nesting cross-validation for model selection and evaluation... |
Why not just add the parameters |
I will submit a PR to introduce a |
There is a related discussion in #18748 where the conclusion is to add |
hello, it would be useful if the HistGradientBoostingRegressor or HistGradientBoostingClassifier model had the ability to avoid data shuffling when using the early_stopping and validation_fraction parameters, since maintaining data order is a basic requirement in case you work with TimeSeries
https://github.com/scikit-learn/scikit-learn/blob/98cf537f5/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py#L427
Describe your proposed solution
it would be sufficient to add an additional parameter to control whether or not to shuffle the data
Describe alternatives you've considered, if relevant
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: