8000 DEP Deprecate n_classes_ in GradientBoostingRegressor by simonamaggio · Pull Request #17702 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
< 8000 ul class="pagehead-actions flex-shrink-0 d-none d-md-inline" style="padding: 2px 0;">
  • Notifications You must be signed in to change notification settings
  • Fork 25.8k
  • DEP Deprecate n_classes_ in GradientBoostingRegressor #17702

    New issue

    Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

    By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

    Already on GitHub? Sign in to your account

    Merged
    merged 29 commits into from
    Aug 13, 2020

    Conversation

    simonamaggio
    Copy link
    Contributor
    @simonamaggio simonamaggio commented Jun 24, 2020

    Fix #17673

    In base class use private _n_classes and for classifier create public attribute n_classes_.

    @glemaitre
    Copy link
    Member

    I think that there is a cleaner solution which might require a bit more refactoring.

    @glemaitre
    Copy link
    Member
    glemaitre commented Jun 24, 2020
    ```diff diff --git a/sklearn/ensemble/_gb.py b/sklearn/ensemble/_gb.py index 179eb81816..f3223c2d36 100644 --- a/sklearn/ensemble/_gb.py +++ b/sklearn/ensemble/_gb.py @@ -165,6 +165,10 @@ class BaseGradientBoosting(BaseEnsemble, metaclass=ABCMeta): self.n_iter_no_change = n_iter_no_change self.tol = tol
    • @AbstractMethod
    • def _validate_y(self, y, sample_weight=None):
    •    pass
      
    • def _fit_stage(self, i, X, y, raw_predictions, sample_weight, sample_mask,
      random_state, X_csc=None, X_csr=None):
      """Fit another stage of n_classes_ trees to the boosting model. """
      @@ -240,10 +244,12 @@ class BaseGradientBoosting(BaseEnsemble, metaclass=ABCMeta):
      else:
      loss_class = _gb_losses.LOSS_FUNCTIONS[self.loss]
    •    if self.loss in ('huber', 'quantile'):
      
    •        self.loss_ = loss_class(self.n_classes_, self.alpha)
      
    •    else:
      
    •    if is_classifier(self):
             self.loss_ = loss_class(self.n_classes_)
      
    •    elif self.loss in ("huber", "quantile"):
      
    •        self.loss_ = loss_class(self.alpha)
      
    •    else:
      
    •        self.loss_ = loss_class()
      
         if not (0.0 < self.subsample <= 1.0):
             raise ValueError("subsample must be in (0,1] but "
      

    @@ -265,11 +271,9 @@ class BaseGradientBoosting(BaseEnsemble, metaclass=ABCMeta):

         if isinstance(self.max_features, str):
             if self.max_features == "auto":
    
    •            # if is_classification
      
    •            if self.n_classes_ > 1:
      
    •            if is_classifier(self):
                     max_features = max(1, int(np.sqrt(self.n_features_)))
      
    •            else:
      
    •                # is regression
      
    •            else:  # is regression
                     max_features = self.n_features_
             elif self.max_features == "sqrt":
                 max_features = max(1, int(np.sqrt(self.n_features_)))
      

    @@ -405,7 +409,11 @@ class BaseGradientBoosting(BaseEnsemble, metaclass=ABCMeta):
    sample_weight = _check_sample_weight(sample_weight, X)

         y = column_or_1d(y, warn=True)
    
    •    y = self._validate_y(y, sample_weight)
      
    •    if is_classifier(self):
      
    •        y = self._validate_y(y, sample_weight)
      
    •    else:
      
    •        y = self._validate_y(y)
      
         if self.n_iter_no_change is not None:
             stratify = y if is_classifier(self) else None
      

    @@ -711,15 +719,6 @@ class BaseGradientBoosting(BaseEnsemble, metaclass=ABCMeta):

         return averaged_predictions
    
    • def _validate_y(self, y, sample_weight):
    •    # 'sample_weight' is not utilised but is used for
      
    •    # consistency with similar method _validate_y of GBC
      
    •    self.n_classes_ = 1
      
    •    if y.dtype.kind == 'O':
      
    •        y = y.astype(DOUBLE)
      
    •    # Default implementation
      
    •    return y
      
    • def apply(self, X):
      """Apply trees in the ensemble to X, return leaf indices.

    @@ -1575,6 +1574,11 @@ class GradientBoostingRegressor(RegressorMixin, BaseGradientBoosting):
    validation_fraction=validation_fraction,
    n_iter_no_change=n_iter_no_change, tol=tol, ccp_alpha=ccp_alpha)

    • def _validate_y(self, y):
    •    if y.dtype.kind == 'O':
      
    •        y = y.astype(DOUBLE)
      
    •    return y
      
    • def predict(self, X):
      """Predict regression target for X.

    diff --git a/sklearn/ensemble/_gb_losses.py b/sklearn/ensemble/_gb_losses.py
    index fa33cc39ab..fae9ccfcd2 100644
    --- a/sklearn/ensemble/_gb_losses.py
    +++ b/sklearn/ensemble/_gb_losses.py
    @@ -152,11 +152,8 @@ class RegressionLossFunction(LossFunction, metaclass=ABCMeta):
    n_classes : int
    Number of classes.
    """

    • def init(self, n_classes):
    •    if n_classes != 1:
      
    •        raise ValueError("``n_classes`` must be 1 for regression but "
      
    •                         "was %r" % n_classes)
      
    •    super().__init__(n_classes)
      
    • def init(self):

    •    super().__init__(n_classes=1)
      

      def check_init_estimator(self, estimator):
      """Make sure estimator has the required fit and predict methods.
      @@ -340,8 +337,8 @@ class HuberLossFunction(RegressionLossFunction):
      Machine, The Annals of Statistics, Vol. 29, No. 5, 2001.
      """

    • def init(self, n_classes, alpha=0.9):
    •    super().__init__(n_classes)
      
    • def init(self, alpha=0.9):
    •    super().__init__()
         self.alpha = alpha
         self.gamma = None
      

    @@ -445,8 +442,8 @@ class QuantileLossFunction(RegressionLossFunction):
    alpha : float, default=0.9
    The percentile.
    """

    • def init(self, n_classes, alpha=0.9):
    •    super().__init__(n_classes)
      
    • def init(self, alpha=0.9):
    •    super().__init__()
         self.alpha = alpha
         self.percentile = alpha * 100
      

    @glemaitre
    Copy link
    Member

    So since _validate_y is different for the classifier and regressor, it does not make sense to have it in the base class. We can make an abstractmethod such that it needs to be implemented by the inherited class. Then, we need to change the regression loss, we should not specify any n_classes because there are no classes in regression. I think that almost all the tests should pass with this patch. I will check

    @glemaitre
    Copy link
    Member

    We have an additional change for max_features where we can check is_classifier(self) instead of n_classes_

    @glemaitre
    Copy link
    Member

    We need to update the test of the loss function where we should not specify the parameter n_classes in the regression loss.

    Copy link
    Member
    @glemaitre glemaitre left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Let's go for the next step

    Copy link
    Member
    @glemaitre glemaitre left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    It starts to look good. We will need an entry in whats_new/v_0_24.rst in the ensemble section to announce the deprecation

    @simonamaggio
    Copy link
    Contributor Author

    whats_new/v_0_24.rst

    Where is this file? I cannot find it.

    @ogrisel
    Copy link
    Member
    ogrisel commented Jun 26, 2020

    whats_new/v_0_24.rst

    Where is this file? I cannot find it.

    It's doc/whats_new/v_0_24.rst.

    If you use vs code, type "ctrl-p" followed by "24" to navigate to it quickly.

    Copy link
    Member
    @thomasjpfan thomasjpfan left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Thank you for the PR @simonamaggio !

    @glemaitre glemaitre changed the title Deprecate n_classes_ in GradientBoostingRegressor DEP Deprecate n_classes_ in GradientBoostingRegressor Jul 10, 2020
    Copy link
    Member
    @glemaitre glemaitre left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    LGTM. @thomasjpfan are all changes fine with you?

    @deprecated("Attribute n_classes_ was deprecated " # type: ignore
    "in version 0.24 and will be removed in 0.26.")
    @property
    def n_classes_(self):
    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    This passed our common test? This seems like n_classes_ is defined without calling fit.

    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    What is the common test? I tested with pytest -v sklearn/ensemble/tests/test_gradient_boosting.py and all tests passed.

    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    We test all our estimators including GradientBoostingRegressor with common test (in sklearn/tests/test_common.py). The one I am thinking about is:

    def check_no_attributes_set_in_init(name, estimator_orig):
    """Check setting during init. """

    More specifically n_classes_ is defined even if the estimator is not fitted:

    from sklearn.ensemble import GradientBoostingRegressor
    gb = GradientBoostingRegressor()
    gb.n_classes_
    # 1

    From looking at the test, it looks like vars(estimator) does not pick up the n_classes_ property.

    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    GradientBoostingRegressor passed all common tests. In particular:
    sklearn/tests/test_common.py::test_estimators[GradientBoostingRegressor()-check_no_attributes_set_in_init] PASSED [ 26%]

    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I think it is incorrectly passing. n_classes_ should not be defined before fit is called.

    As for this PR, lets update the method:

        @property
        def n_classes_(self):
            try:
                check_is_fitted(self)
            except NotFittedError as nfe:
                raise AttributeError(
                    "{} object has no n_classes_ attribute."
                    .format(self.__class__.__name__)
                ) from nfe

    and then have a test to make sure the AttributeError is raised.

    We can update the check_no_attributes_set_in_init to catch these cases in another PR.

    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I updated the method as suggested. Instead I'm not sure how to create the issue regarding check_no_attributes_set_in_init.

    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I will look into check_no_attributes_set_in_init, this PR is almost ready.

    Can we add a test to make sure AttributeError is raised when GradientBoostingRegressor is not fitted yet?

    Copy link
    Member
    @thomasjpfan thomasjpfan left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Thank you, this PR is almost ready!

    @deprecated("Attribute n_classes_ was deprecated " # type: ignore
    "in version 0.24 and will be removed in 0.26.")
    @property
    def n_classes_(self):
    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I will look into check_no_attributes_set_in_init, this PR is almost ready.

    Can we add a test to make sure AttributeError is raised when GradientBoostingRegressor is not fitted yet?

    @simonamaggio
    Copy link
    Contributor Author
    simonamaggio commented Jul 23, 2020

    Can we add a test to make sure AttributeError is raised when GradientBoostingRegressor is not fitted yet?

    Done it now. Sorry I didn't get it the first time.

    Copy link
    Member
    @thomasjpfan thomasjpfan left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Minor comment.

    Otherwise LGTM

    Comment on lines 1329 to 1330
    msg = f"{GradientBoostingRegressor.__name__} object " \
    "has no n_classes_ attribute."
    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggested change
    msg = f"{GradientBoostingRegressor.__name__} object " \
    "has no n_classes_ attribute."
    msg = ("GradientBoostingRegressor object "
    "has no n_classes_ attribute.")

    @glemaitre glemaitre self-assigned this Aug 13, 2020
    @glemaitre
    Copy link
    Member

    I solve the conflict and make the small change requested by @thomasjpfan
    I will merge once the CIs are happy.

    @glemaitre glemaitre merged commit 989613e into scikit-learn:master Aug 13, 2020
    @glemaitre
    Copy link
    Member

    Thanks @simonamaggio

    jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
    …7702)
    
    Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    Deprecate n_classes_ in GradientBoostingRegressor
    4 participants
    0