-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[WIP] GradientBoostingClassifierCV without early stopping #8226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Sample groups for the cross-validation splitter. | ||
""" | ||
if isinstance(self.cv_n_estimators, (numbers.Integral, np.integer)): | ||
print('heee') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arghh sorry forgot to remove the scaffold :@
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is probably useful in practice. However, I think adding a use_warm_start
parameter to GridSearchCV
would automatically handle this case, RandomForests, SGD, etc. without defining a new API. WDYT?
Otherwise, please add this to the list in doc/modules/grid_search.rst
, and to appropriate "see also"s.
@@ -0,0 +1,77 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this example is worth having. A nice benchmark, but users don't gain from playing with it; as much can be said ("Will always improve performance over GridSearchCV for searching over n_estimators") in narrative docs and what's new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll add it as a gist snippet at the PR description...
""" | ||
if isinstance(self.cv_n_estimators, (numbers.Integral, np.integer)): | ||
print('heee') | ||
cv_n_estimators = np.array([self.cv_n_estimators, ], dtype=np.int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps this case should be interpreted as range(1, cv_n_estimators + 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yea. That would be more useful...
learning rate shrinks the contribution of each tree by `learning_rate`. | ||
There is a trade-off between learning_rate and n_estimators. | ||
|
||
cv_n_estimators : int or array-like of shape (n_cv_stages), (default=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use this cv_
prefix elsewhere for *CV
objects? We use Cs
, alphas
, etc. That convention is hard to adopt here. learning_curve
uses param_range
, and I think n_estimators_range
would be okay here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for n_estimators_range
... Thx!
estimator.set_params(n_estimators=n_estimators) | ||
estimator.fit(X_train, y_train, sample_weight=weight_train) | ||
all_stage_scores[i] = scorer(estimator, X_test, y_test, | ||
sample_weight=weight_test) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This use of sample_weight
differs from GridSearchCV. Might be best to leave it out for now, else to note it prominently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But eventually we would want to support sample_weights
at GridSearchCV
no? I'll push a commit which documents this... Let me know if you still want it removed... (Moreover the GradientBoostingClassifier.fit
supports sw, and users might expect a similar interface maybe?)
Closing in favor of #8230 |
@jnothman From recent discussions, I think this is simpler and also easier to use... |
I'm opening this now. I'll address your comments on this soon... |
We support sample weights to grid search that are off to fit but not to
score.
…On 26 Jan 2017 8:48 am, "(Venkat) Raghav (Rajagopalan)" < ***@***.***> wrote:
Reopened #8226 <#8226>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8226 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6yYzMroxNB9knK_7jraRPiFiqqKSks5rV8MngaJpZM4LruL2>
.
|
Have removed example and addressed your comments. Will add tests, clean up docs and ping you back! |
Closing again in favor of updated #8230 ;) |
Hahaha you can reopen to implement early stopping...? |
:p I was wondering about it... What would be the API? Do we perform cv or specify a validation set? |
Also ping @agramfort! :) we discussed about this and decided to split it into two different problems the GradientBoostingCV part and the early stopping part... Now with #8230, using |
Spin off from #7071 without the complications of early stopping API
This PR tries to implement just
GradientBoostingClassifierCV
(And I intend to restrict it to GBCCV / GBRCV alone without early stopping support). It takes advantage of the incremental boosting stages and for the same performance is much faster thanGridSearchCV
.Results
Code for the plot - https://gist.github.com/raghavrv/21d59453de5c6890c89e9f907bcd4044
Thanks @agramfort for IRL discussions leading to this simpler PR!!
Also ping @amueller, @jnothman, @vighneshbirodkar, @ogrisel and @pprett
TODO
Polish example's docRemove exampleGradientBoostingRegressorCV
GBRCV
.GridSearchCV