10000 DEP Deprecate n_classes_ in GradientBoostingRegressor by simonamaggio · Pull Request #17702 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

DEP Deprecate n_classes_ in GradientBoostingRegressor #17702

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Aug 13, 2020

Conversation

simonamaggio
Copy link
Contributor
@simonamaggio simonamaggio commented Jun 24, 2020

Fix #17673

In base class use private _n_classes and for classifier create public attribute n_classes_.

@glemaitre
Copy link
Member

I think that there is a cleaner solution which might require a bit more refactoring.

@glemaitre
Copy link
Member
glemaitre commented Jun 24, 2020
```diff diff --git a/sklearn/ensemble/_gb.py b/sklearn/ensemble/_gb.py index 179eb81816..f3223c2d36 100644 --- a/sklearn/ensemble/_gb.py +++ b/sklearn/ensemble/_gb.py @@ -165,6 +165,10 @@ class BaseGradientBoosting(BaseEnsemble, metaclass=ABCMeta): self.n_iter_no_change = n_iter_no_change self.tol = tol
  • @AbstractMethod
  • def _validate_y(self, y, sample_weight=None):
  •    pass
    
  • def _fit_stage(self, i, X, y, raw_predictions, sample_weight, sample_mask,
    random_state, X_csc=None, X_csr=None):
    """Fit another stage of n_classes_ trees to the boosting model. """
    @@ -240,10 +244,12 @@ class BaseGradientBoosting(BaseEnsemble, metaclass=ABCMeta):
    else:
    loss_class = _gb_losses.LOSS_FUNCTIONS[self.loss]
  •    if self.loss in ('huber', 'quantile'):
    
  •        self.loss_ = loss_class(self.n_classes_, self.alpha)
    
  •    else:
    
  •    if is_classifier(self):
           self.loss_ = loss_class(self.n_classes_)
    
  •    elif self.loss in ("huber", "quantile"):
    
  •        self.loss_ = loss_class(self.alpha)
    
  •    else:
    
  •        self.loss_ = loss_class()
    
       if not (0.0 < self.subsample <= 1.0):
    10000
    
           raise ValueError("subsample must be in (0,1] but "
    

@@ -265,11 +271,9 @@ class BaseGradientBoosting(BaseEnsemble, metaclass=ABCMeta):

     if isinstance(self.max_features, str):
         if self.max_features == "auto":
  •            # if is_classification
    
  •            if self.n_classes_ > 1:
    
  •            if is_classifier(self):
                   max_features = max(1, int(np.sqrt(self.n_features_)))
    
  •            else:
    
  •                # is regression
    
  •            else:  # is regression
                   max_features = self.n_features_
           elif self.max_features == "sqrt":
               max_features = max(1, int(np.sqrt(self.n_features_)))
    

@@ -405,7 +409,11 @@ class BaseGradientBoosting(BaseEnsemble, metaclass=ABCMeta):
sample_weight = _check_sample_weight(sample_weight, X)

     y = column_or_1d(y, warn=True)
  •    y = self._validate_y(y, sample_weight)
    
  •    if is_classifier(self):
    
  •        y = self._validate_y(y, sample_weight)
    
  •    else:
    
  •        y = self._validate_y(y)
    
       if self.n_iter_no_change is not None:
           stratify = y if is_classifier(self) else None
    

@@ -711,15 +719,6 @@ class BaseGradientBoosting(BaseEnsemble, metaclass=ABCMeta):

     return averaged_predictions
  • def _validate_y(self, y, sample_weight):
  •    # 'sample_weight' is not utilised but is used for
    
  •    # consistency with similar method _validate_y of GBC
    
  •    self.n_classes_ = 1
    
  •    if y.dtype.kind == 'O':
    
  •        y = y.astype(DOUBLE)
    
  •    # Default implementation
    
  •    return y
    
  • def apply(self, X):
    """Apply trees in the ensemble to X, return leaf indices.

@@ -1575,6 +1574,11 @@ class GradientBoostingRegressor(RegressorMixin, BaseGradientBoosting):
validation_fraction=validation_fraction,
n_iter_no_change=n_iter_no_change, tol=tol, ccp_alpha=ccp_alpha)

  • def _validate_y(self, y):
  •    if y.dtype.kind == 'O':
    
  •        y = y.astype(DOUBLE)
    
  •    return y
    
  • def predict(self, X):
    """Predict regression target for X.

diff --git a/sklearn/ensemble/_gb_losses.py b/sklearn/ensemble/_gb_losses.py
index fa33cc39ab..fae9ccfcd2 100644
--- a/sklearn/ensemble/_gb_losses.py
+++ b/sklearn/ensemble/_gb_losses.py
@@ -152,11 +152,8 @@ class RegressionLossFunction(LossFunction, metaclass=ABCMeta):
n_classes : int
Number of classes.
"""

  • def init(self, n_classes):
  •    if n_classes != 1:
    
  •        raise ValueError("``n_classes`` must be 1 for regression but "
    
  •                         "was %r" % n_classes)
    
  •    super().__init__(n_classes)
    
  • def init(self):

  •    super().__init__(n_classes=1)
    

    def check_init_estimator(self, estimator):
    """Make sure estimator has the required fit and predict methods.
    @@ -340,8 +337,8 @@ class HuberLossFunction(RegressionLossFunction):
    Machine, The Annals of Statistics, Vol. 29, No. 5, 2001.
    """

  • def init(self, n_classes, alpha=0.9):
  •    super().__init__(n_classes)
    
  • def init(self, alpha=0.9):
  •    super().__init__()
       self.alpha = alpha
       self.gamma = None
    

@@ -445,8 +442,8 @@ class QuantileLossFunction(RegressionLossFunction):
alpha : float, default=0.9
The percentile.
"""

  • def init(self, n_classes, alpha=0.9):
  •    super().__init__(n_classes)
    
  • def init(self, alpha=0.9):
  •    super().__init__()
       self.alpha = alpha
       self.percentile = alpha * 100
    

@glemaitre
Copy link
Member

So since _validate_y is different for the classifier and regressor, it does not make sense to have it in the base class. We can make an abstractmethod such that it needs to be implemented by the inherited class. Then, we need to change the regression loss, we should not specify any n_classes because there are no classes in regression. I think that almost all the tests should pass with this patch. I will check

@glemaitre
Copy link
Member

We have an additional change for max_features where we can check is_classifier(self) instead of n_classes_

@glemaitre
Copy link
Member

We need to update the test of the loss function where we should not specify the parameter n_classes in the regression loss.

Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go for the next step

Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It starts to look good. We will need an entry in whats_new/v_0_24.rst in the ensemble section to announce the deprecation

@simonamaggio
Copy link
Contributor Author

whats_new/v_0_24.rst

Where is this file? I cannot find it.

@ogrisel
Copy link
Member
ogrisel commented Jun 26, 2020

whats_new/v_0_24.rst

Where is this file? I cannot find it.

It's doc/whats_new/v_0_24.rst.

If you use vs code, type "ctrl-p" followed by "24" to navigate to it quickly.

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @simonamaggio !

@glemaitre glemaitre changed the title Deprecate n_classes_ in GradientBoostingRegressor DEP Deprecate n_classes_ in GradientBoostingRegressor Jul 10, 2020
Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @thomasjpfan are all changes fine with you?

@deprecated("Attribute n_classes_ was deprecated " # type: ignore
"in version 0.24 and will be removed in 0.26.")
@property
def n_classes_(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This passed our common test? This seems like n_classes_ is defined without calling fit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the common test? I tested with pytest -v sklearn/ensemble/tests/test_gradient_boosting.py and all tests passed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We test all our estimators including GradientBoostingRegressor with common test (in sklearn/tests/test_common.py). The one I am thinking about is:

def check_no_attributes_set_in_init(name, estimator_orig):
"""Check setting during init. """

More specifically n_classes_ is defined even if the estimator is not fitted:

from sklearn.ensemble import GradientBoostingRegressor
gb = GradientBoostingRegressor()
gb.n_classes_
# 1

From looking at the test, it looks like vars(estimator) does not pick up the n_classes_ property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GradientBoostingRegressor passed all common tests. In particular:
sklearn/tests/test_common.py::test_estimators[GradientBoostingRegressor()-check_no_attributes_set_in_init] PASSED [ 26%]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is incorrectly passing. n_classes_ should not be defined before fit is called.

As for this PR, lets update the method:

    @property
    def n_classes_(self):
        try:
            check_is_fitted(self)
        except NotFittedError as nfe:
            raise AttributeError(
                "{} object has no n_classes_ attribute."
                .format(self.__class__.__name__)
            ) from nfe

and then have a test to make sure the AttributeError is raised.

We can update the check_no_attributes_set_in_init to catch these cases in another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the method as suggested. Instead I'm not sure how to create the issue regarding check_no_attributes_set_in_init.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into check_no_attributes_set_in_init, this PR is almost ready.

Can we add a test to make sure AttributeError is raised when GradientBoostingRegressor is not fitted yet?

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this PR is almost ready!

@deprecated("Attribute n_classes_ was deprecated " # type: ignore
"in version 0.24 and will be removed in 0.26.")
@property
def n_classes_(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into check_no_attributes_set_in_init, this PR is almost ready.

Can we add a test to make sure AttributeError is raised when GradientBoostingRegressor is not fitted yet?

@simonamaggio
Copy link
Contributor Author
simonamaggio commented Jul 23, 2020

Can we add a test to make sure AttributeError is raised when GradientBoostingRegressor is not fitted yet?

Done it now. Sorry I didn't get it the first time.

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment.

Otherwise LGTM

Comment on lines 1329 to 1330
msg = f"{GradientBoostingRegressor.__name__} object " \
"has no n_classes_ attribute."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
msg = f"{GradientBoostingRegressor.__name__} object " \
"has no n_classes_ attribute."
msg = ("GradientBoostingRegressor object "
"has no n_classes_ attribute.")

@glemaitre glemaitre self-assigned this Aug 13, 2020
@glemaitre
Copy link
Member

I solve the conflict and make the small change requested by @thomasjpfan
I will merge once the CIs are happy.

@glemaitre glemaitre merged commit 989613e into scikit-learn:master Aug 13, 2020
@glemaitre
Copy link
Member

Thanks @simonamaggio

jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
…7702)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deprecate n_classes_ in GradientBoostingRegressor
4 participants
0