-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FEA Add metadata routing for RFE and RFECV #29312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn/feature_selection/_rfe.py
Outdated
) | ||
router.add( | ||
scorer=self._get_scorer(), | ||
method_mapping=MethodMapping().add(caller="fit", callee="score"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't score
also call score
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have a score method?
Edit: I see this inherits from RFE so it has a score method but is that applicable as the scorer isn't used in that. That calls the estimator score method. I added score
and predict
in the estimator part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit odd now. It's odd that we use the given scorer in fit
, but not in score
, that seems like something which needs to be fixed.
WDYT @glemaitre
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would consider this as a bug and fix it here and add an entry in the changelog stating that the estimator is now using the scoring metadata when calling score
.
We had this pattern in LogisticRegressionCV
and I assume that we considered it as a bug as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean we need to add a score method which uses the scorer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep similarly to:
scikit-learn/sklearn/linear_model/_logistic.py
Lines 2190 to 2235 in 2ffe206
def score(self, X, y, sample_weight=None, **score_params): | |
"""Score using the `scoring` option on the given test data and labels. | |
Parameters | |
---------- | |
X : array-like of shape (n_samples, n_features) | |
Test samples. | |
y : array-like of shape (n_samples,) | |
True labels for X. | |
sample_weight : array-like of shape (n_samples,), default=None | |
Sample weights. | |
**score_params : dict | |
Parameters to pass to the `score` method of the underlying scorer. | |
.. versionadded:: 1.4 | |
Returns | |
------- | |
score : float | |
Score of self.predict(X) w.r.t. y. | |
""" | |
_raise_for_params(score_params, self, "score") | |
scoring = self._get_scorer() | |
if _routing_enabled(): | |
routed_params = process_routing( | |
self, | |
"score", | |
sample_weight=sample_weight, | |
**score_params, | |
) | |
else: | |
routed_params = Bunch() | |
routed_params.scorer = Bunch(score={}) | |
if sample_weight is not None: | |
routed_params.scorer.score["sample_weight"] = sample_weight | |
return scoring( | |
self, | |
X, | |
y, | |
**routed_params.scorer.score, | |
8000 ) |
sklearn/feature_selection/_rfe.py
Outdated
) | ||
router.add( | ||
scorer=self._get_scorer(), | ||
method_mapping=MethodMapping().add(caller="fit", callee="score"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit odd now. It's odd that we use the given scorer in fit
, but not in score
, that seems like something which needs to be fixed.
WDYT @glemaitre
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments. But it looks pretty good.
@@ -668,6 +758,7 @@ class RFECV(RFE): | |||
"n_jobs": [None, Integral], | |||
} | |||
_parameter_constraints.pop("n_features_to_select") | |||
__metadata_request__fit = {"groups": metadata_routing.UNUSED} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that we don't have a linter complaining here because it is unused. If nothing is complaining then this is fine, otherwise we can do:
__metadata_request__fit = {"groups": metadata_routing.UNUSED} | |
__metadata_request__fit = {"groups": metadata_routing.UNUSED} # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do class attributes need to be used? Linters don't complain if we don't use a defined method of a class for instance.
sklearn/feature_selection/_rfe.py
Outdated
) | ||
router.add( | ||
scorer=self._get_scorer(), | ||
method_mapping=MethodMapping().add(caller="fit", callee="score"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would consider this as a bug and fix it here and add an entry in the changelog stating that the estimator is now using the scoring metadata when calling score
.
We had this pattern in LogisticRegressionCV
and I assume that we considered it as a bug as well.
sklearn/feature_selection/_rfe.py
Outdated
return router | ||
|
||
def _get_scorer(self): | ||
return check_scoring(self.estimator, scoring=self.scoring) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is safer to use get_scorer
:
return check_scoring(self.estimator, scoring=self.scoring) | |
if self.scoring is None: | |
scoring = "accuracy" if is_classifier(self.estimator) else "r2" | |
else: | |
scoring = self.scoring | |
return get_scorer(scoring) |
My issue with check_scoring
is triggered when self.scoring is None
. Indeed, we are going to delegate to the estimator.score
method and thus we don't return a scorer.
I'm wondering if there is any side effect when returning the estimator.score
that does not have the score_params
in the signature. I assume that we should have the same error than when calling a scorer
with an estimator that does not support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah _PassThroughScorer breaks on 3 tests when scoring is None because of recursion depth exceeded meaning it gets stuck in an endless loop. The code you suggested seems fine, we just have to tweak the MockClassifier defined in the tests a bit to make the tests pass. Could you have a look at the latest changes to see if they look alright and then I can add the additional changelog.
.add(caller="score", callee="score"), | ||
) | ||
return router | ||
|
||
|
||
class RFECV(RFE): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: We inherit from RFE and we create RFE object inside the class. I have the impression that this would be such an anti-pattern :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFECV is such an odd estimator. We also do things like
rfe.fit(X, y)
# Set final attributes
...
self.estimator_ = clone(self.estimator)
self.estimator_.fit(self._transform(X), y)
I'm like, WUT? How many times do we need to fit the same thing.
Tests failing here. |
All reactions
Sorry, something went wrong.
Yes I need to check and fix 😄 |
All reactions
Sorry, something went wrong.
@adrinjalali Is it possible to remove DelegatorData("RFECV", RFECV, skip_methods=["transform", "inverse_transform"]), from test_metaestimator_delegation Because of how we are handling the scoring for |
All reactions
Sorry, something went wrong.
Looking at the stacktrace: sklearn/metrics/_scorer.py:288: in __call__
return self._score(partial(_cached_call, None), estimator, X, y_true, **_kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = make_scorer(r2_score, response_method='predict'), method_caller = functools.partial(<function _cached_call at 0x7eec9e9cdee0>, None), estimator = SubEstimator()
X = array([[-3.92679910e-02, 1.31911756e-01, -2.11205984e-01,
-1.21414740e+00, 1.05004467e+00, -4.21327587e-01,
...-1.20373519e+00,
-3.24730265e-01, 4.17043503e-01, 4.58931008e-02,
1.34803578e+00, 4.98726958e-01]])
y_true = array([0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1]), kwargs = {}, pos_label = None
response_method = functools.partial(<function test_metaestimator_delegation.<locals>.SubEstimator.predict at 0x7eec9d8c6a20>, SubEstimator())
def _score(self, method_caller, estimator, X, y_true, **kwargs):
"""Evaluate the response method of `estimator` on `X` and `y_true`.
Parameters
----------
method_caller : callable
Returns predictions given an estimator, method name, and other
arguments, potentially caching results.
estimator : object
Trained estimator to use for scoring.
X : {array-like, sparse matrix}
Test data that will be fed to clf.decision_function or
clf.predict_proba.
y_true : array-like
Gold standard target values for X. These must be class labels,
not decision function values.
**kwargs : dict
Other parameters passed to the scorer. Refer to
:func:`set_score_request` for more details.
Returns
-------
score : float
Score function applied to prediction of estimator on X.
"""
self._warn_overlap(
message=(
"There is an overlap between set kwargs of this scorer instance and"
" passed metadata. Please pass them either as kwargs to `make_scorer`"
" or metadata, but not both."
),
kwargs=kwargs,
)
pos_label = None if is_regressor(estimator) else self._get_pos_label()
response_method = _check_response_method(estimator, self._response_method)
y_pred = method_caller(
> estimator, response_method.__name__, X, pos_label=pos_label
)
E AttributeError: 'functools.partial' object has no attribute '__name__'. Did you mean: '__ne__'? It looks like we're mishandling |
All reactions
Sorry, something went wrong.
.add(caller="score", callee="score"), | ||
) | ||
return router | ||
|
||
|
||
class RFECV(RFE): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFECV is such an odd estimator. We also do things like
rfe.fit(X, y)
# Set final attributes
...
self.estimator_ = clone(self.estimator)
self.estimator_.fit(self._transform(X), y)
I'm like, WUT? How many times do we need to fit the same thing.
Sorry, something went wrong.
All reactions
@@ -668,6 +758,7 @@ class RFECV(RFE): | |||
"n_jobs": [None, Integral], | |||
} | |||
_parameter_constraints.pop("n_features_to_select") | |||
__metadata_request__fit = {"groups": metadata_routing.UNUSED} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do class attributes need to be used? Linters don't complain if we don't use a defined method of a class for instance.
Sorry, something went wrong.
All reactions
_raise_for_params(score_params, self, "score") | ||
scoring = self._get_scorer() | ||
if _routing_enabled(): | ||
routed_params = process_routing(self, "score", **score_params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not tested?
Sorry, something went wrong.
All reactions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added score
in the scorer_routing_methods in test_metaestimator_metadata_routing
Sorry, something went wrong.
All reactions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Maybe @adam2392 could have a look?
Sorry, something went wrong.
All reactions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except one part I just can't parse the diff!
Sorry, something went wrong.
All reactions
@@ -55,7 +55,9 @@ def __init__( | |||
skip_methods=["score"], | |||
), | |||
DelegatorData("RFE", RFE, skip_methods=["transform", "inverse_transform"]), | |||
DelegatorData("RFECV", RFECV, skip_methods=["transform", "inverse_transform"]), | |||
DelegatorData( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this isn't part of your PR necessarily, but I'm unsure what DELEGATING_METAESTIMATORS
purpose is, and can't easily figure it out from the glancing at the code.
Is it worth adding some comments somewhere above to indicate what's going on here?
cc: @adrinjalali
Sorry, something went wrong.
All reactions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is cryptic unless one reads the actual test. We're checking if these methods are only available if the sub-estimator has them, and since now we're implementing the score
w/o caring about the sub-estimator, we can skip it here. But a comment somewhere wouldn't hurt.
Sorry, something went wrong.
All reactions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment.
Sorry, something went wrong.
All reactions
adrinjalali
glemaitre
adam2392
Successfully merging this pull request may close these issues.
RFE/RFECV doesn't work with sample weights
Reference Issues/PRs
Towards #22893
Fixes #7308
What does this implement/fix? Explain your changes.
Any other comments?
CC: @adrinjalali @glemaitre