DEP Deprecate n_classes_ in GradientBoostingRegressor #17702

simonamaggio · 2020-06-24T13:14:19Z

In base class use private _n_classes and for classifier create public attribute n_classes_.

…n_classes_

glemaitre · 2020-06-24T20:43:21Z

I think that there is a cleaner solution which might require a bit more refactoring.

glemaitre · 2020-06-24T20:43:56Z

```diff diff --git a/sklearn/ensemble/_gb.py b/sklearn/ensemble/_gb.py index 179eb81816..f3223c2d36 100644 --- a/sklearn/ensemble/_gb.py +++ b/sklearn/ensemble/_gb.py @@ -165,6 +165,10 @@ class BaseGradientBoosting(BaseEnsemble, metaclass=ABCMeta): self.n_iter_no_change = n_iter_no_change self.tol = tol

@AbstractMethod
def _validate_y(self, y, sample_weight=None):
```
   pass
```
def _fit_stage(self, i, X, y, raw_predictions, sample_weight, sample_mask,
random_state, X_csc=None, X_csr=None):
"""Fit another stage of n_classes_ trees to the boosting model. """
@@ -240,10 +244,12 @@ class BaseGradientBoosting(BaseEnsemble, metaclass=ABCMeta):
else:
loss_class = _gb_losses.LOSS_FUNCTIONS[self.loss]

   if self.loss in ('huber', 'quantile'):

       self.loss_ = loss_class(self.n_classes_, self.alpha)

```
   else:
```

   if is_classifier(self):
       self.loss_ = loss_class(self.n_classes_)

   elif self.loss in ("huber", "quantile"):

       self.loss_ = loss_class(self.alpha)

```
   else:
```

       self.loss_ = loss_class()

   if not (0.0 < self.subsample <= 1.0):
10000

       raise ValueError("subsample must be in (0,1] but "

@@ -265,11 +271,9 @@ class BaseGradientBoosting(BaseEnsemble, metaclass=ABCMeta):

     if isinstance(self.max_features, str):
         if self.max_features == "auto":

```
           # if is_classification
```
```
           if self.n_classes_ > 1:
```

           if is_classifier(self):
               max_features = max(1, int(np.sqrt(self.n_features_)))

```
           else:
```
```
               # is regression
```

           else:  # is regression
               max_features = self.n_features_
       elif self.max_features == "sqrt":
           max_features = max(1, int(np.sqrt(self.n_features_)))

@@ -405,7 +409,11 @@ class BaseGradientBoosting(BaseEnsemble, metaclass=ABCMeta):
sample_weight = _check_sample_weight(sample_weight, X)

     y = column_or_1d(y, warn=True)

   y = self._validate_y(y, sample_weight)

```
   if is_classifier(self):
```

       y = self._validate_y(y, sample_weight)

```
   else:
```

       y = self._validate_y(y)

   if self.n_iter_no_change is not None:
       stratify = y if is_classifier(self) else None

@@ -711,15 +719,6 @@ class BaseGradientBoosting(BaseEnsemble, metaclass=ABCMeta):

     return averaged_predictions

def _validate_y(self, y, sample_weight):

   # 'sample_weight' is not utilised but is used for

   # consistency with similar method _validate_y of GBC

```
   self.n_classes_ = 1
```
```
   if y.dtype.kind == 'O':
```
```
       y = y.astype(DOUBLE)
```
```
   # Default implementation
```
```
   return y
```
def apply(self, X):
"""Apply trees in the ensemble to X, return leaf indices.

@@ -1575,6 +1574,11 @@ class GradientBoostingRegressor(RegressorMixin, BaseGradientBoosting):
validation_fraction=validation_fraction,
n_iter_no_change=n_iter_no_change, tol=tol, ccp_alpha=ccp_alpha)

def _validate_y(self, y):
```
   if y.dtype.kind == 'O':
```
```
       y = y.astype(DOUBLE)
```
```
   return y
```
def predict(self, X):
"""Predict regression target for X.

diff --git a/sklearn/ensemble/_gb_losses.py b/sklearn/ensemble/_gb_losses.py
index fa33cc39ab..fae9ccfcd2 100644
--- a/sklearn/ensemble/_gb_losses.py
+++ b/sklearn/ensemble/_gb_losses.py
@@ -152,11 +152,8 @@ class RegressionLossFunction(LossFunction, metaclass=ABCMeta):
n_classes : int
Number of classes.
"""

def init(self, n_classes):
```
   if n_classes != 1:
```

       raise ValueError("``n_classes`` must be 1 for regression but "

                        "was %r" % n_classes)

```
   super().__init__(n_classes)
```

def init(self):
```
   super().__init__(n_classes=1)
```
def check_init_estimator(self, estimator):
"""Make sure estimator has the required fit and predict methods.
@@ -340,8 +337,8 @@ class HuberLossFunction(RegressionLossFunction):
Machine, The Annals of Statistics, Vol. 29, No. 5, 2001.
"""

def init(self, n_classes, alpha=0.9):
```
   super().__init__(n_classes)
```

def init(self, alpha=0.9):

   super().__init__()
   self.alpha = alpha
   self.gamma = None

@@ -445,8 +442,8 @@ class QuantileLossFunction(RegressionLossFunction):
alpha : float, default=0.9
The percentile.
"""

def init(self, n_classes, alpha=0.9):
```
   super().__init__(n_classes)
```

def init(self, alpha=0.9):

   super().__init__()
   self.alpha = alpha
   self.percentile = alpha * 100

glemaitre · 2020-06-24T20:46:49Z

So since _validate_y is different for the classifier and regressor, it does not make sense to have it in the base class. We can make an abstractmethod such that it needs to be implemented by the inherited class. Then, we need to change the regression loss, we should not specify any n_classes because there are no classes in regression. I think that almost all the tests should pass with this patch. I will check

glemaitre · 2020-06-24T20:49:47Z

We have an additional change for max_features where we can check is_classifier(self) instead of n_classes_

glemaitre · 2020-06-24T20:50:28Z

We need to update the test of the loss function where we should not specify the parameter n_classes in the regression loss.

glemaitre

Let's go for the next step

sklearn/ensemble/_gb.py

glemaitre

It starts to look good. We will need an entry in whats_new/v_0_24.rst in the ensemble section to announce the deprecation

sklearn/ensemble/_gb.py

sklearn/ensemble/tests/test_gradient_boosting.py

sklearn/ensemble/tests/test_gradient_boosting_loss_functions.py

simonamaggio · 2020-06-26T15:24:15Z

whats_new/v_0_24.rst

Where is this file? I cannot find it.

ogrisel · 2020-06-26T16:13:49Z

whats_new/v_0_24.rst

Where is this file? I cannot find it.

It's doc/whats_new/v_0_24.rst.

If you use vs code, type "ctrl-p" followed by "24" to navigate to it quickly.

sklearn/ensemble/_gb.py

thomasjpfan

Thank you for the PR @simonamaggio !

sklearn/ensemble/_gb.py

sklearn/ensemble/_gb_losses.py

glemaitre

LGTM. @thomasjpfan are all changes fine with you?

doc/whats_new/v0.24.rst

sklearn/ensemble/_gb.py

thomasjpfan · 2020-07-12T16:55:19Z

sklearn/ensemble/_gb.py

+    @deprecated("Attribute n_classes_ was deprecated "  # type: ignore
+                "in version 0.24 and will be removed in 0.26.")
+    @property
+    def n_classes_(self):


This passed our common test? This seems like n_classes_ is defined without calling fit.

What is the common test? I tested with pytest -v sklearn/ensemble/tests/test_gradient_boosting.py and all tests passed.

We test all our estimators including GradientBoostingRegressor with common test (in sklearn/tests/test_common.py). The one I am thinking about is:

scikit-learn/sklearn/utils/estimator_checks.py

Lines 2425 to 2426 in e97fd14

def check_no_attributes_set_in_init(name, estimator_orig):

"""Check setting during init. """

More specifically n_classes_ is defined even if the estimator is not fitted:

from sklearn.ensemble import GradientBoostingRegressor gb = GradientBoostingRegressor() gb.n_classes_ # 1

From looking at the test, it looks like vars(estimator) does not pick up the n_classes_ property.

GradientBoostingRegressor passed all common tests. In particular:
sklearn/tests/test_common.py::test_estimators[GradientBoostingRegressor()-check_no_attributes_set_in_init] PASSED [ 26%]

I think it is incorrectly passing. n_classes_ should not be defined before fit is called.

As for this PR, lets update the method:

@property def n_classes_(self): try: check_is_fitted(self) except NotFittedError as nfe: raise AttributeError( "{} object has no n_classes_ attribute." .format(self.__class__.__name__) ) from nfe

and then have a test to make sure the AttributeError is raised.

We can update the check_no_attributes_set_in_init to catch these cases in another PR.

I updated the method as suggested. Instead I'm not sure how to create the issue regarding check_no_attributes_set_in_init.

I will look into check_no_attributes_set_in_init, this PR is almost ready.

Can we add a test to make sure AttributeError is raised when GradientBoostingRegressor is not fitted yet?

thomasjpfan

Thank you, this PR is almost ready!

thomasjpfan · 2020-07-21T18:45:00Z

sklearn/ensemble/_gb.py

+    @deprecated("Attribute n_classes_ was deprecated "  # type: ignore
+                "in version 0.24 and will be removed in 0.26.")
+    @property
+    def n_classes_(self):


I will look into check_no_attributes_set_in_init, this PR is almost ready.

Can we add a test to make sure AttributeError is raised when GradientBoostingRegressor is not fitted yet?

sklearn/ensemble/tests/test_gradient_boosting.py

simonamaggio · 2020-07-23T07:37:41Z

Can we add a test to make sure AttributeError is raised when GradientBoostingRegressor is not fitted yet?

Done it now. Sorry I didn't get it the first time.

thomasjpfan

Minor comment.

Otherwise LGTM

thomasjpfan · 2020-07-27T19:07:46Z

sklearn/ensemble/tests/test_gradient_boosting.py

+    msg = f"{GradientBoostingRegressor.__name__} object " \
+        "has no n_classes_ attribute."


Suggested change

msg = f"{GradientBoostingRegressor.__name__} object " \

"has no n_classes_ attribute."

msg = ("GradientBoostingRegressor object "

"has no n_classes_ attribute.")

glemaitre · 2020-08-13T15:57:48Z

I solve the conflict and make the small change requested by @thomasjpfan
I will merge once the CIs are happy.

glemaitre · 2020-08-13T17:26:13Z

Thanks @simonamaggio

…7702) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

simonamaggio added 2 commits June 24, 2020 15:09

make n_classes_ to private _n_classes and expose only for classifier …

135c220

…n_classes_

rm n_classes_ from docstring of regressor

4ab884f

github-actions bot added the module:ensemble label Jun 24, 2020

simonamaggio added 2 commits June 26, 2020 09:50

cleaner refactor + change loss init + update test of losses

826f454

bug fix update test of losses

c2053c2

glemaitre reviewed Jun 26, 2020

View reviewed changes

sklearn/ensemble/_gb.py Show resolved Hide resolved

restored attribute with docstring deprecated + added test

e520a01

glemaitre reviewed Jun 26, 2020

View reviewed changes

sklearn/ensemble/_gb.py Outdated Show resolved Hide resolved

glemaitre reviewed Jun 26, 2020

View reviewed changes

simonamaggio added 3 commits June 26, 2020 17:18

apex in docstring

0b8991e

change comment + no list

8bdbd9f

remove useless variable

b1fef86

fix linting

58982e6

glemaitre reviewed Jun 26, 2020

View reviewed changes

sklearn/ensemble/_gb.py Outdated Show resolved Hide resolved

simonamaggio added 5 commits June 29, 2020 10:54

update v0.24.rst

9e79751

Add type ignore comment

bab9946

merge master

751d1ac

type ignor on first line to fix unsupported decorated property

86fe3a4

add comment about mypy error

03c0142

thomasjpfan reviewed Jul 5, 2020

View reviewed changes

sklearn/ensemble/_gb.py Outdated Show resolved Hide resolved

sklearn/ensemble/_gb_losses.py Outdated Show resolved Hide resolved

sklearn/ensemble/_gb_losses.py Show resolved Hide resolved

sklearn/ensemble/_gb_losses.py Show resolved Hide resolved

simonamaggio added 3 commits July 6, 2020 13:32

rm pass and add description of abstract method

e755592

rm n_classes from loss docstring

9b0981d

sync upstream and merge

d82f7c5

glemaitre changed the title ~~Deprecate n_classes_ in GradientBoostingRegressor~~ DEP Deprecate n_classes_ in GradientBoostingRegressor Jul 10, 2020

glemaitre approved these changes Jul 10, 2020

View reviewed changes

thomasjpfan reviewed Jul 12, 2020

View reviewed changes

simonamaggio added 4 commits July 13, 2020 21:25

add mod exception in whats_new

5ad9bcf

add input argument sample_weight

e7f1851

Merge branch 'master' into deprecate_n_classes_gboost

aa9e7e5

update n_classes_ method to raise error if non fitted else return 1

4092400

thomasjpfan reviewed Jul 22, 2020

View reviewed changes

simonamaggio added 3 commits July 23, 2020 09:15

specify n_classes_ directly in test_gbr_deprecated_attr

9072e42

add test attribute error is raised if gbr not fitted

66f69a1

shorter comment line

b01bf8b

simonamaggio added 2 commits July 23, 2020 11:09

no next in raise attr error if not fitted

4e838dc

add match msg

e78ed4d

thomasjpfan approved these changes Jul 27, 2020

View reviewed changes

glemaitre self-assigned this Aug 13, 2020

Merge remote-tracking branch 'origin/master' into pr/simonamaggio/17702

d2b5ad0

glemaitre added 2 commits August 13, 2020 18:14

fix

429eedc

iter

65b7f97

glemaitre merged commit 989613e into scikit-learn:master Aug 13, 2020

jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020

DEP Deprecate n_classes_ in GradientBoostingRegressor (scikit-learn#1…

c6e0627

…7702) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

	def check_no_attributes_set_in_init(name, estimator_orig):
	"""Check setting during init. """

		msg = f"{GradientBoostingRegressor.__name__} object " \
		"has no n_classes_ attribute."

Uh oh!

DEP Deprecate n_classes_ in GradientBoostingRegressor #17702

DEP Deprecate n_classes_ in GradientBoostingRegressor #17702

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!