-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Support arbitrary init estimators for Gradient Boosting #5221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The reordering makes it hard to understand the diff, could you please keep the same ordering for this PR? |
Done. Should be easier to understand the diff now. |
I am not sure that we should allow classifiers as initial estimator. The gradient boosting solves classification and regression tasks by regressing the negative gradient of a loss function. If we solve this issue, I would prefer to refactor how the losses work. The api of the current initial estimate is pretty bad as sometimes the predict of those estimators is indeed a To give an overview of the api problem, the PriorProbabilityEstimator is indeed one mean estimator per class and could be also viewed as a multi-output mean estimator with a one-of-k encoding. Furthermore in this case, the predict doesn't respect the predict api as it is a disguised predict_proba and it outputs a (n_samples, n_classes) array instead of a (n_samples, ) array. |
@arjoly I agree the patch was much more difficult than it needed to be due to the variety of APIs which could be passed in, and that the loss functions do seem very complex and perhaps should be looked at again. I am not very well versed in the use cases so can't speak to how useful passing in a base estimator is, but it seems better to support a working version than a buggy version should the user choose to use it. |
Alternatively, we can deprecate the init parameter. The refactoring (à la ivalice) would still be a good thing. |
Why? I think @agramfort found it a very effective way to train high performing models:
@pprett do you have experience to share on the use of non-dummy init for GBRT? |
indeed. It was an idea that @pprett gave me and worked great in practice.
|
@@ -1107,7 +1172,7 @@ def decision_function(self, X): | |||
return score | |||
|
|||
def _staged_decision_function(self, X): | |||
"""Compute decision function of ``X`` for each iteration. | |||
"""Compute decision function of ``, X`` for each iteration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo?
Comments incorporated, commits squashed, and rebased on master. I think it is worth fixing this issue for now, since the default option is to not require an initial estimator and there is a use case. |
Does this seem like a useful addition? Lets either decide to put it in, or close the PR. |
For me yes, it is worth fixing the issue for now, and then think about a better API for losses. |
For the initial input, I believe the optimal strategy is loss specific. So, unless we wrap around the optimal strategies for each of the losses, I would for now trust the user and use the provided |
This seems like something to keep in mind for when we refactor the losses. I will rebase this PR tomorrow then. |
This has been rebased and should be ready to go. I am unsure what the opinion of others is, though. |
@@ -954,7 +955,7 @@ def fit(self, X, y, sample_weight=None, monitor=None): | |||
check_consistent_length(X, y, sample_weight) | |||
|
|||
y = self._validate_y(y) | |||
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trailing space
Thanks @ogrisel, comments have been addressed! |
One-hot encoding the results from |
@@ -965,8 +964,41 @@ def fit(self, X, y, sample_weight=None, monitor=None): | |||
# fit initial model - FIXME make sample_weight optional | |||
self.init_.fit(X, y, sample_weight) | |||
|
|||
# init predictions | |||
y_pred = self.init_.predict(X) | |||
if is_classifier(self.init_): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trailing space
Is this now solved? |
This branch was never merged. |
This is sad... |
Can you rebase, I'll put reviewing this PR on my TODO list |
Any update? Has this been, or will it be, part of the 0.18 release? |
@TalhaAsmal No. |
@amueller is there interest in reviving this? If so I can cherry-pick the commits and try resurrecting this... |
The original issue was solved by #12983. This PR can be closed. |
In #2691, it was brought up that passing an estimator as a base class to Gradient Boosting Classifiers or Regressors would cause it to crash due to various shape issues on y. The main issue being that it was expected that the predictions of the base estimator should be of shape (n_samples, n_classes) for multinomial classification, or (n_samples, 1) for binary classification and regression.
This PR solves this issue by handling each case separately. If the initialization estimator is one of those already in
gradient_boosting.py
, it uses those predictions as normal. However, if a classifier was passed in, it will check to see if it has apredict_proba
method and use that if possible (collapsing into the log odds if only two classes). If the classifier does not have apredict_proba
method, it will use thepredict
method and hot encode that into a matrix. If this needs to be collapsed because there are only two classes, it adds a small epsilon to the matrix before calculating the log odds. If a regressor is passed in, then the predictions are just reshaped to make sense.I also reordered the code a little bit for it to be more organized, and added two unit tests to make sure that it works. It now works with arbitrary estimators, as long as they take sample weights into their fit method, and the unit test includes tests on Support Vector Machines and ridge regression initializations.
ping @ogrisel @agramfort @glouppe @pprett