[MRG] Fix GBDT init parameter when it's a pipeline #13472

NicolasHug · 2019-03-18T22:15:59Z

Reference Issues/PRs

Fixes #13466

What does this implement/fix? Explain your changes.

This PR fixes the support of the init estimator of GBDTs when init is a pipeline.

Note that pipeline do not support sample weights.

Any other comments?

Thomasillo · 2019-03-19T08:22:10Z

Perfect, thanks a lot!

adrinjalali · 2019-03-19T09:10:03Z

sklearn/ensemble/gradient_boosting.py

@@ -1484,7 +1484,7 @@ def fit(self, X, y, sample_weight=None, monitor=None):
            else:
                try:
                    self.init_.fit(X, y, sample_weight=sample_weight)
-                except TypeError:
+                except (TypeError, ValueError):


I feel like here it should be actually more strict catching than loose. A lot of (if not most of) init param validation happens in fit, and they raise a ValueError if the parameters are not valid, and here the user would instead see a message complaining about sample_weights which would be irrelevant.

Wouldn't checking the signature, or the [appropriate] estimator tag be a better idea here?

the user would instead see a message complaining about sample_weights which would be irrelevant.

Why is it irrelevant? This is precisely what this check is about.

Wouldn't checking the signature, or the [appropriate] estimator tag be a better idea here?

Yes I agree, but apparently using a try catch is preferred #12983 (comment)

EDIT: just saw that a supports_sample_weight tag is currently discussed #13438 but it's far from done

It's not elegant but I'm okay with this.

For example, NuSVR supports sample_weights on fit:

X, y = make_regression(random_state=0) NuSVR(nu=1.5).fit(X, y, sample_weight=np.ones(X.shape[0]))

gives:

ValueError: nu <= 0 or nu > 1

But after this PR:

init = make_pipeline(NuSVR(nu=1.5)) gb = GradientBoostingRegressor(init=init) gb.fit(X, y, sample_weight=np.ones(X.shape[0]))

gives:

ValueError: The initial estimator Pipeline does not support sample weights.

Ooh you were talking about the input checking of the init estimator, ok, good point.

@jnothman do you think using has_fit_param() would be justified here? As far as I understand, this would only be a problem if a user passes a custom estimator which accepts sample_weight in fit() as a keyword args.

Another option would be to test the error message of the ValueError coming from a pipeline and only raise in this case?

I don't see how has_fit_param helps here.

But perhaps we should raise a more equivocal error message ("could not fit init estimator with sample_weight") and use raise from to report the original exception

I don't see how has_fit_param helps here.

This check is only supposed to check whether the init estimator supports samples_weights. I had to add ValueError for pipelines because unlike traditional estimators, they don't raise TypeError. As @adrinjalali noted, now the check also catches ValueError coming from other reasons (namely input checking).

Using has_fit_param would avoid this, I think.

Using has_fit_param would avoid this, I think.

How so?

Using has_fit_param avoids using a try except

jnothman · 2019-03-19T22:32:06Z

sklearn/ensemble/gradient_boosting.py

@@ -1484,7 +1484,7 @@ def fit(self, X, y, sample_weight=None, monitor=None):
            else:
                try:
                    self.init_.fit(X, y, sample_weight=sample_weight)
-                except TypeError:
+                except (TypeError, ValueError):


It's not elegant but I'm okay with this.

jnothman · 2019-03-22T00:18:03Z

Yes but it has false negatives

NicolasHug · 2019-03-22T01:32:22Z

I updated the code. I hope it's clearer now.

thomasjpfan · 2019-03-22T14:32:57Z

sklearn/ensemble/gradient_boosting.py

+                        if 'not enough values to unpack' in str(e):  # pipeline
+                            raise ValueError(msg)
+                        else:  # regular estimator whose input checking failed
+                            raise e


Small nit:

except ValueError as e: if 'not enough values to unpack' in str(e): # pipeline raise ValueError(msg) raise # regular estimator whose input checking failed

thomasjpfan · 2019-03-22T15:44:30Z

sklearn/ensemble/gradient_boosting.py

                        else:  # regular estimator whose input checking failed
-                            raise e
+                            raise


Nit: Do not need the else here

I personally prefer the whole if/else logic. It's clearer, it doesn't rely on the fact that the above block exits, and has a more functional flavor.

jnothman

A bit messy but okay I guess. Maybe we should just make Pipeline raise a TypeError (or something that's both Type and Value), though.

jnothman · 2019-03-23T22:44:49Z

sklearn/ensemble/gradient_boosting.py

+                    except TypeError:  # regular estimator without SW support
+                        raise ValueError(msg)
+                    except ValueError as e:
+                        if 'not enough values to unpack' in str(e):  # pipeline


I'd rather it if we improved the message for fit params missing __ in Pipeline, but okay

NicolasHug · 2019-03-23T23:01:53Z

I think we'll be able to make it much better once we have a 'supports_sample_weight' tag

adrinjalali · 2019-03-25T16:54:38Z

I think we'll be able to make it much better once we have a 'supports_sample_weight' tag

Could you then add a TODO or a XXX note so that at some point we do change it to exploit the tag once it's there?

…eline

adrinjalali · 2019-03-27T21:45:44Z

@jnothman does your approval here stand? I'm not sure what you think about this one now.

jnothman · 2019-03-27T23:01:20Z

Let's merge and then change when #13534 is fixed.

jnothman · 2019-03-27T23:01:37Z

Thanks @NicolasHug!

)" This reverts commit 7ab82a3.

NicolasHug added 2 commits March 18, 2019 18:08

GBDT init now supports pipelines (incompatible with sample weight)

8486ccb

better handling

e4e6de7

NicolasHug mentioned this pull request Mar 18, 2019

GradientBoostingRegressor initial estimator does not play together with Pipeline #13466

Closed

adrinjalali reviewed Mar 19, 2019

View reviewed changes

jnothman approved these changes Mar 19, 2019

View reviewed changes

NicolasHug added 3 commits March 21, 2019 21:18

pass through exception from failed input checking of init estimator

050366f

actually correct fix

3801a1e

typos

2cb1bb0

adrinjalali approved these changes Mar 22, 2019

View reviewed changes

thomasjpfan reviewed Mar 22, 2019

View reviewed changes

Used raise from and remove raise e

b7f7f9a

thomasjpfan reviewed Mar 22, 2019

View reviewed changes

jnothman reviewed Mar 23, 2019

View reviewed changes

NicolasHug added 2 commits March 25, 2019 13:32

Added XXX comment

d1ffe54

Merge remote-tracking branch 'upstream/master' into fix_gbdt_init_pip…

f1124f9

…eline

jnothman merged commit d6b368e into scikit-learn:master Mar 27, 2019

okz12 mentioned this pull request Apr 6, 2019

[MRG+1] Fix/pipeline param error msg #13536

Merged

jnothman mentioned this pull request Apr 24, 2019

DOC what's new cleaning #13706

Merged

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

FIX GBDT init parameter when it's a pipeline (scikit-learn#13472)

7ab82a3

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "FIX GBDT init parameter when it's a pipeline (scikit-learn#13472

34b74cf

)" This reverts commit 7ab82a3.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "FIX GBDT init parameter when it's a pipeline (scikit-learn#13472

5551301

)" This reverts commit 7ab82a3.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

FIX GBDT init parameter when it's a pipeline (scikit-learn#13472)

5ec2096

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] Fix GBDT init parameter when it's a pipeline #13472

[MRG] Fix GBDT init parameter when it's a pipeline #13472

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[MRG] Fix GBDT init parameter when it's a pipeline #13472

[MRG] Fix GBDT init parameter when it's a pipeline #13472

Uh oh!

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!