8000 [MRG] Support arbitrary init estimators for Gradient Boosting by jmschrei · Pull Request #5221 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG] Support arbitrary init estimators for Gradient Boosting #5221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

jmschrei
Copy link
Member
@jmschrei jmschrei commented Sep 7, 2015

In #2691, it was brought up that passing an estimator as a base class to Gradient Boosting Classifiers or Regressors would cause it to crash due to various shape issues on y. The main issue being that it was expected that the predictions of the base estimator should be of shape (n_samples, n_classes) for multinomial classification, or (n_samples, 1) for binary classification and regression.

This PR solves this issue by handling each case separately. If the initialization estimator is one of those already in gradient_boosting.py, it uses those predictions as normal. However, if a classifier was passed in, it will check to see if it has a predict_proba method and use that if possible (collapsing into the log odds if only two classes). If the classifier does not have a predict_proba method, it will use the predict method and hot encode that into a matrix. If this needs to be collapsed because there are only two classes, it adds a small epsilon to the matrix before calculating the log odds. If a regressor is passed in, then the predictions are just reshaped to make sense.

I also reordered the code a little bit for it to be more organized, and added two unit tests to make sure that it works. It now works with arbitrary estimators, as long as they take sample weights into their fit method, and the unit test includes tests on Support Vector Machines and ridge regression initializations.

ping @ogrisel @agramfort @glouppe @pprett

@ogrisel
Copy link
Member
ogrisel commented Sep 7, 2015

The reordering makes it hard to understand the diff, could you please keep the same ordering for this PR?

@jmschrei
Copy link
Member Author
jmschrei commented Sep 7, 2015

Done. Should be easier to understand the diff now.

@arjoly
Copy link
Member
arjoly commented Sep 7, 2015

I am not sure that we should allow classifiers as initial estimator. The gradient boosting solves classification and regression tasks by regressing the negative gradient of a loss function.

If we solve this issue, I would prefer to refactor how the losses work. The api of the current initial estimate is pretty bad as sometimes the predict of those estimators is indeed a predict_proba or a decision_function. I would favour something simpler à la ivalice with only regressors as weak estimate.

To give an overview of the api problem, the PriorProbabilityEstimator is indeed one mean estimator per class and could be also viewed as a multi-output mean estimator with a one-of-k encoding. Furthermore in this case, the predict doesn't respect the predict api as it is a disguised predict_proba and it outputs a (n_samples, n_classes) array instead of a (n_samples, ) array.

@jmschrei
Copy link
Member Author
jmschrei commented Sep 7, 2015

@arjoly I agree the patch was much more difficult than it needed to be due to the variety of APIs which could be passed in, and that the loss functions do seem very complex and perhaps should be looked at again. I am not very well versed in the use cases so can't speak to how useful passing in a base estimator is, but it seems better to support a working version than a buggy version should the user choose to use it.

@arjoly
Copy link
Member
arjoly commented Sep 7, 2015

Alternatively, we can deprecate the init parameter. The refactoring (à la ivalice) would still be a good thing.

@ogrisel
Copy link
Member
ogrisel commented Sep 7, 2015

Alternatively, we can deprecate the init parameter.

Why? I think @agramfort found it a very effective way to train high performing models:

  • train a random forest classifier as a good baseline classifier on a model
  • make it better by gradient boosting on top of that.

@pprett do you have experience to share on the use of non-dummy init for GBRT?

@agramfort
Copy link
Member
agramfort commented Sep 7, 2015 via email

@@ -1107,7 +1172,7 @@ def decision_function(self, X):
return score

def _staged_decision_function(self, X):
"""Compute decision function of ``X`` for each iteration.
"""Compute decision function of ``, X`` for each iteration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo?

@arjoly
Copy link
Member
arjoly commented Sep 9, 2015

indeed. It was an idea that @pprett gave me and worked great in practice.

Good to know. This pull request could also be viewed in light of #5044

@jmschrei
Copy link
Member Author
jmschrei commented Sep 9, 2015

Comments incorporated, commits squashed, and rebased on master. I think it is worth fixing this issue for now, since the default option is to not require an initial estimator and there is a use case.

@jmschrei
Copy link
Member Author

Does this seem like a useful addition? Lets either decide to put it in, or close the PR.

@glouppe
Copy link
Contributor
glouppe commented Sep 11, 2015

Does this seem like a useful addition? Lets either decide to put it in, or close the PR.

For me yes, it is worth fixing the issue for now, and then think about a better API for losses.

@glouppe
Copy link
Contributor
glouppe commented Sep 13, 2015

For the initial input, I believe the optimal strategy is loss specific. So, unless we wrap around the optimal strategies for each of the losses, I would for now trust the user and use the provided init estimator, without doing any transformation.

@jmschrei
Copy link
Member Author

This seems like something to keep in mind for when we refactor the losses. I will rebase this PR tomorrow then.

@jmschrei
Copy link
Member Author

This has been rebased and should be ready to go. I am unsure what the opinion of others is, though.

@@ -954,7 +955,7 @@ def fit(self, X, y, sample_weight=None, monitor=None):
check_consistent_length(X, y, sample_weight)

y = self._validate_y(y)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trailing space

@jmschrei
Copy link
Member Author

Thanks @ogrisel, comments have been addressed!

@jmschrei
Copy link
Member Author

One-hot encoding the results from predict for classifiers has been removed, since classifiers can predict non-contiguous integers or strings as labels. A classifier must have a predict_proba method, but a regressor will use its predict method.

ping @ogrisel @arjoly @glouppe @agramfort

@@ -965,8 +964,41 @@ def fit(self, X, y, sample_weight=None, monitor=None):
# fit initial model - FIXME make sample_weight optional
self.init_.fit(X, y, sample_weight)

# init predictions
y_pred = self.init_.predict(X)
if is_classifier(self.init_):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trailing space

@tsterbak
Copy link

Is this now solved?
I have problems passing a RandomForestClassifier as init.

@jmschrei
Copy link
Member Author

This branch was never merged.

@tsterbak
Copy link

This is sad...

@MechCoder
Copy link
Member

Can you rebase, I'll put reviewing this PR on my TODO list

@TalhaAsmal
Copy link

Any update? Has this been, or will it be, part of the 0.18 release?

@amueller
Copy link
Member
amueller commented Sep 9, 2016

@TalhaAsmal No.

@raghavrv raghavrv added this to the 0.19 milestone Oct 28, 2016
@raghavrv
Copy link
Member

@amueller is there interest in reviving this? If so I can cherry-pick the commits and try resurrecting this...

@amueller amueller modified the milestone: 0.19 Jun 12, 2017
Base automatically changed from master to main January 22, 2021 10:48
@jeremiedbb
Copy link
Member

The original issue was solved by #12983. This PR can be closed.

@jeremiedbb jeremiedbb closed this Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0