-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
GBDT support custom validation set #15127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @qinhanmin2014, just to clarify you are talking about the BaseHistGradientBoosting class where the validation split is taking place? P.S. I want to work on this and just needed the clarification. |
Yes, something like xgb/lgb/ctb. Be careful, there's only +1 from me (we need +2 before making the decision) |
This could be used for any estimator that implements early stopping, not just the GBDTs. What would the API look like? New arguments to Clearly that's not a trivial decision / design ;) |
@NicolasHug @TomDLT |
That's a very good point. Assuming the features are preprocessed by a pipeline, that would make the API complex quite complex to get right. For instance, assume that scikit-learn had a rebalancing meta-estimator (such as I don't think it's possible or desirable to have an API to handle this methodological point automagically. But we should definitely make it possible to pass this kind a pre-computed validation set and have an example to document this kind of pipelines. |
How about joining forces with https://github.com/keras-team/keras and/or folding into tensorflow? |
I tried to setup a grid search with GBDT from scikit-learn and XGBoost (using the custom validation set) in order to compare them. However, passing eval set in fit:
results in somewhat expected:
so I could only use either GBDT or XGBoost in grid search, but not both estimators in the same run. Perhaps as part of this issue, first step would be to allow the "eval_set" in kwargs, even if it won't (yet) be used internally for early stopping so that one can at least directly compare XGBoost models with ones in scikit-learn. |
Let‘s centralize the discussion and join #18748. |
Is it reasonable to support custom validation set in GBDT? Currently we do train_test_split internally but I think sometimes users want to do train_test_split themselves (e.g., sometimes we want to use first 80% of the dataset as training set and the rest of the dataset as validation set.).
xgboost, lightgbm and catboost all support custom validation set.
The text was updated successfully, but these errors were encountered: