-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG] FIX gradient boosting with sklearn estimator as init #12436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too familiar with GB code, but this looks good to me except for a minor comment below.
You also need to repeat "close" or "fix" in the description for every issue you want to auto-close: i.e. close #123, close#1234, etc
..
I can confirm the added test fails on master.
Maybe @glemaitre would be able to do another review..
@@ -1421,7 +1421,9 @@ def fit(self, X, y, sample_weight=None, monitor=None): | |||
self.init_.fit(X, y, sample_weight) | |||
|
|||
# init predictions | |||
y_pred = self.init_.predict(X) | |||
y_pred = self.init_.predict(X).astype(np.float64, copy=False) | |||
if y_pred.shape == (X.shape[0],): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't,
y_pred.ndim == 1
be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably :)
@@ -1421,7 +1421,9 @@ def fit(self, X, y, sample_weight=None, monitor=None): | |||
self.init_.fit(X, y, sample_weight) | |||
|
|||
# init predictions | |||
y_pred = self.init_.predict(X) | |||
y_pred = self.init_.predict(X).astype(np.float64, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you don't cast it to 64bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An error is raised due to unsafe casting because y_pred is updated later by the actual fit of the gradient boosting, and filled with float64 values.
y_pred[:, k] += (learning_rate * tree.value[:, 0, 0].take(terminal_regions, axis=0))
Also please add a what's new. I guess we could try to squeeze it into 0.20.1 since it's a long standing issue that has been reported multiple times.. |
@stoddardg Pointed out another related bug. When the init estimator does not support sample weights, it crashes even if one calls fit with |
7d3c06e
to
53abfd6
Compare
# fit initial model - FIXME make sample_weight optional | ||
self.init_.fit(X, y, sample_weight) | ||
# fit initial model | ||
if has_fit_parameter(self.init_, "sample_weight"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid has_fit_parameter unless try-except really won't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means keyword args won't work. We should only use this kind of thing when we really need to take a different approach when sample_weight is not supported. (Feel free to document this caveat more in a separate pr)
cc85fee
to
12c18f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not certain this should go in 0.20.1, but whatever...?
Nice work!
@rth I made some changes since your last review. Maybe you want to give another look at it.
It's not a regression from 0.19 so maybe it should go in 0.21 indeed. |
ae99693
to
92cb295
Compare
# init does not support sample weights | ||
init_est = _NoSampleWeightWrapper(init()) | ||
gb(init=init_est).fit(X, y) # ok no sample weights | ||
with pytest.raises(ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more specific about the error,
with pytest.raises(ValueError): | |
with pytest.raises( | |
ValueError, | |
match=r'.*estimator.*does not support sample weights.*'): |
gb, dataset_maker, init = task | ||
|
||
X, y = dataset_maker() | ||
sample_weight = np.random.rand(100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use a RNG,
sample_weight = np.random.rand(100) | |
sample_weight = np.random.RandomState(42).rand(100) |
# estimator. | ||
# Check that an error is raised if trying to fit with sample weight but | ||
# inital estimator does not support sample weight | ||
gb, dataset_maker, init = task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use a bit more explicit name here,
init_estimator
else: | ||
raise ValueError( | ||
"The initial estimator {} does not support sample " | ||
"weights yet.".format(self.init_.__class__.__name__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"weights yet.".format(self.init_.__class__.__name__)) | |
"weights.".format(self.init_.__class__.__name__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was to point that it's not a definitive statement :)
if sample_weight is None: | ||
sample_weight = np.ones(n_samples, dtype=np.float32) | ||
sample_weight_was_none = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sample_weight_was_none = True | |
sample_weight_is_none = True |
else: | ||
sample_weight = column_or_1d(sample_weight, warn=True) | ||
sample_weight_was_none = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sample_weight_was_none = False | |
sample_weight_is_none = False |
# fit initial model | ||
try: | ||
self.init_.fit(X, y, sample_weight=sample_weight) | ||
except TypeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe calling inspect.signature(self.init_.fit)
, and adapting the kwargs, in consequence, would be cleaner? Here I wonder what happens if one gets a genuine unrelated TypeError, then understanding the traceback will be somewhat difficult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, I see, it was the case originally and Joel requested the opposite change in #12436 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to see this fixed!
The current fix does not support multiclass classification, I added some suggestions.
@pytest.mark.parametrize( | ||
"task", | ||
[(GradientBoostingClassifier, make_classification, DummyClassifier), | ||
(GradientBoostingRegressor, make_regression, DummyRegressor)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing for multiclass classification currently fails:
def make_multiclass():
return make_classification(n_classes=3, n_clusters_per_class=1)
@pytest.mark.parametrize(
"task",
[(GradientBoostingClassifier, make_classification, DummyClassifier),
(GradientBoostingClassifier, make_multiclass, DummyClassifier),
(GradientBoostingRegressor, make_regression, DummyRegressor)],
ids=["classification", "multiclass", "regression"]
def test_gradient_boosting_with_init(task):
...
# will fail with make_multiclass
y_pred = self.init_.predict(X) | ||
y_pred = self.init_.predict(X).astype(np.float64, copy=False) | ||
if y_pred.ndim == 1: | ||
y_pred = y_pred[:, np.newaxis] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
y_pred = y_pred[:, np.newaxis] | |
if self.loss_.K >= 2: | |
# multiclass classification needs y_pred to be of shape | |
# (n_samples, K) where K is the number of trees per | |
# iteration, i.e. the number of classes. | |
# see e.g. PriorProbabilityEstimator or ZeroEstimator | |
n_samples = y_pred.shape[0] | |
y_pred = np.tile(y_pred, self.n_classes_) | |
y_pred = y_pred.reshape(n_samples, self.loss_.K) | |
else: | |
y_pred = y_pred[:, np.newaxis] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Wouldn't be better to make a one-hot of y_pred ?
I pushed it, let me know which version is best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes one-hot seems to be a better option. I think my version would have been equivalent to a zero initialization (or any constant value).
BTW I realized we could use if loss.is_multi_class
to be more explicit.
Should we detect that this is happening and bail in the partial dependence plot? Non-constant init estimators will make the partial dependence plot misleading / wrong. |
@amueller what do you mean "this is happening"? I don't think this is related to partial dependence plot, we're simply fixing a shape issue here. |
By "this" I mean "a non-constant init estimator". So far in our implementation init estimators were always constant. After this patch they will not be. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM again! Thanks @NicolasHug!
b5d1462
to
1129cb1
Compare
Thinking about this again, I don't think it makes any sense to have an init predictor in a classification setting (It's OK for regression). This So currently This is not a problem in regression because I would propose to either:
Slightly related: lots of the docstrings actually state that Hope it's clear... I can try to clarify if needed |
Is a classifier with decision_function appropriate? Or is it appropriate to
use an initial regressor in classification? I've not thought about it, just
trying to understand your proposal on the surface, @NicolasHug
|
That would be slightly less inappropriate than using In binary classification:
My proposition is to replace the constant It does not make sense to set |
From discussing with @NicolasHug I think the solution in this PR is not correct for the classification case and we should do what @NicolasHug proposes (which actually also cleans up the GradientBoosting interface). |
I propose the following: Each loss function should have a For the binary classification loss, this would look like
while for least squares this would be
The built-in init estimators like It would be similar to what we do in pygbm, except that pygbm does not accept custom init estimators. @jeremiedbb you probably didn't sign up for this so if you want, I can take it up. I was planning on refactoring the GB code anyway (putting losses in another file, renaming |
Does it make sense to make it get_init_probabilities and put the sigmoid
inside the method?
|
I don't think so because |
@NicolasHug feel free to take over. I won't have much time to work on it, and this is going out of my comfort zone :) |
Fixes #10302, Fixes #12429, Fixes #2691
Gradient Boosting used to fail when init was a sklearn estimator, which is a bit ironic :)
Issue was that the predict output didn't have the expected shape. And apparently there was no test for the init parameter with other estimator than default.
Edit Also accept initial estimator which does not support sample weights as long as the gradient boosting is not fitted with sample weights