SLEP006 - Add Metadata Routing to LogisticRegressionCV #24498

OmarManzoor · 2022-09-22T18:46:14Z

Reference Issues/PRs

Towards: #22893

What does this implement/fix? Explain your changes.

Added meta data routing to LogisticRegressionCV.
Added add_self to specify that LogisticRegressionCV is also a consumer.
Added routing for the cv and score methods

Any other comments?

I might need some guidance on where to add further tests for this since it is not exactly a meta estimator.

jnothman

Thanks for this @omar! Nice work!
I agree it's hard to know what to test, and we should probably define expectations for this family of PRs.

I'd love to see a test, or an example, that uses a cv=GroupKFold(n_splits=5) or similar, which is a real motivating use case.

I can also see an error here: if the scorer requests sample_weight it won't receive it, and, unless I'm much mistaken, this will happen silently (i.e. there will be no error).

To fix this, we need to pass sample_weight to process_routing in fit. We would benefit from a test that the scorer receives sample_weight if requested.

OmarManzoor · 2022-09-23T07:16:59Z

Thanks for this @omar! Nice work! I agree it's hard to know what to test, and we should probably define expectations for this family of PRs.

I'd love to see a test, or an example, that uses a cv=GroupKFold(n_splits=5) or similar, which is a real motivating use case.

I can also see an error here: if the scorer requests sample_weight it won't receive it, and, unless I'm much mistaken, this will happen silently (i.e. there will be no error).

To fix this, we need to pass sample_weight to process_routing in fit. We would benefit from a test that the scorer receives sample_weight if requested.

Thank you for the guidance!

OmarManzoor · 2022-09-26T10:50:58Z

Thanks for this @omar! Nice work! I agree it's hard to know what to test, and we should probably define expectations for this family of PRs.
I'd love to see a test, or an example, that uses a cv=GroupKFold(n_splits=5) or similar, which is a real motivating use case.
I can also see an error here: if the scorer requests sample_weight it won't receive it, and, unless I'm much mistaken, this will happen silently (i.e. there will be no error).
To fix this, we need to pass sample_weight to process_routing in fit. We would benefit from a test that the scorer receives sample_weight if requested.

Thank you for the guidance!

I can't quite understand properly how to route the scoring parameters. Actually even if I pass sample weight here

routed_params = process_routing( obj=self, method="fit", other_params=fit_params, sample_weight=sample_weight, )

we still need someway to specify the child method of the scorer. However the available methods only contain score whereas any individual scorers themselves do not seem seem to define a score method. They instead have _score which is called through the __call__ method of the base scorer. Could you kindly guide me a bit here? @jnothman

jnothman · 2022-09-29T22:15:24Z

Scorers don't have a score method, but for the purpose of allowing the user to set routing requests, we call it 'score'. So when the scorer is applied, we need to pass it the parameters that are routed to it by the name 'score'. Does that help, or is a code snippet more the right kind of support here?

adrinjalali

These PRs should also add the meta estimators here: https://github.com/scikit-learn/scikit-learn/blob/sample-props/sklearn/tests/test_metaestimators_metadata_routing.py

@jnothman anything else you think should be tested on all meta-estimators apart from what's there?

adrinjalali · 2022-09-30T12:37:15Z

sklearn/linear_model/_logistic.py

+        )
+
+        return scoring(
+            self, X, y, sample_weight=sample_weight, **routed_params.scorer.score


routed_params.scorer.score would include sample_weight, no need to pass it explicitly.

sklearn/linear_model/_logistic.py

OmarManzoor · 2022-09-30T13:16:55Z

https://github.com/scikit-learn/scikit-learn/blob/sample-props/sklearn/tests/test_metaestimators_metadata_routing.py

How would I actually define the estimator_name for this case?

adrinjalali · 2022-09-30T13:44:21Z

@OmarManzoor your link is to a file, I'm not sure what you're referring to :)

OmarManzoor · 2022-09-30T13:51:30Z

@OmarManzoor your link is to a file, I'm not sure what you're referring to :)

For example
{ "metaestimator": MultiOutputRegressor, "estimator_name": "estimator", "estimator": ConsumingRegressor, "X": X, "y": y_multi, "routing_methods": ["fit", "partial_fit"], "warns_on": { "fit": ["sample_weight", "metadata"], "partial_fit": ["sample_weight"], }, },

We need the estimator_name here. For meta estimators this should be just the estimator parameter name. In this case it is a little different.

adrinjalali · 2022-09-30T14:19:03Z

Oh I see what you mean. This meta-estimator doesn't actually have any sub-estimators the way others do.

So we kinda need to replicate the same kinds of tests, but for scorer and cv in this case. @BenjaminBossan might have more concrete ideas.

Ideally we'd have them in a way to use them for other similar *CV classes.

OmarManzoor · 2022-10-05T07:44:54Z

Oh I see what you mean. This meta-estimator doesn't actually have any sub-estimators the way others do.

So we kinda need to replicate the same kinds of tests, but for scorer and cv in this case. @BenjaminBossan might have more concrete ideas.

Ideally we'd have them in a way to use them for other similar *CV classes.

@adrinjalali Could you kindly clarify one thing. For scorers it seems like we won't raise any warnings or errors since when routing we actually search for the actual method specified, return getattr(self, method)._route_params(params=params). In the case of scorers there is no "score" method so there is technically no set request for this method. So when writing tests should we ignore both the warning and error tests? Moreover this function test_setting_request_removes_warning_or_error does not check any recorded metadata and only checks for the correct functioning of the requested metadata. Since no errors or warnings are raised in the first place this would not be a beneficial test. Should we then create a separate function to test for the scorers of the meta estimators which somehow records the meta data and checks it or is there something else that we need to do for scorers?

adrinjalali · 2022-10-05T08:11:57Z

For scorers we would raise an error if routing is not correct. For instance, if the scorer supports sample_weight, and no request is set for score, but sample_weight is passed, we raise.

We don't warn for them since the existing code doesn't do the routing, therefore all of it would be new functionality.

I'm not sure why you say the test_setting_request_removes_warning_or_error test is not beneficial. I'm happy to add more tests, would you like to write it and we review here?

OmarManzoor · 2022-10-05T08:26:02Z

For scorers we would raise an error if routing is not correct. For instance, if the scorer supports sample_weight, and no request is set for score, but sample_weight is passed, we raise.

We don't warn for them since the existing code doesn't do the routing, therefore all of it would be new functionality.

I'm not sure why you say the test_setting_request_removes_warning_or_error test is not beneficial. I'm happy to add more tests, would you like to write it and we review here?

Well if an error is raised then I think we can use the existing tests. However in that case I am not clear on how the error would be raised, since the underlying scorer does not have a "score" method whereas we try to check the signature of the score method since the "score" method is passed in the routing processing.

adrinjalali · 2022-10-05T08:33:44Z

The scorer is the score method itself, and the meta-estimator is the one doing the routing, and that's where the error would occur if anything is wrong.

Instead of checking the signature, we should now check the routing of the scorer object. Scorers expose routing the same way as estimators do.

OmarManzoor · 2022-10-05T10:13:12Z

The scorer is the score method itself, and the meta-estimator is the one doing the routing, and that's where the error would occur if anything is wrong.

Instead of checking the signature, we should now check the routing of the scorer object. Scorers expose routing the same way as estimators do.

So I was trying to create a ConsumingScorer and use it within the existing tests structure in test_metaestimators_metadata_routing. But I seem to be doing it incorrectly. Would it be okay for me to push the code so that you can take a look and guide me?

adrinjalali · 2022-10-05T10:24:23Z

Sure, happy to check.

OmarManzoor · 2022-10-05T10:27:21Z

Sure, happy to check.

Pushed the code. It does not raise an error for the function test_error_for_other_methods.

adrinjalali · 2022-10-07T10:58:30Z

Could you please merge with the latest scikit-learn/sample-props branch?

OmarManzoor · 2022-10-19T07:48:32Z

@adrinjalali Any feedback on this? Is it incorrect to test for the ConsumingScorer like this? Instead should we actually check the actual routing data and decide based on that?

adrinjalali · 2022-10-21T14:02:53Z

I'm trying to figure out why the CI fails here. But I get a build not found error when I try to look. Could you please send an empty commit? Hoping to see what the issue here is.

adrinjalali

This is great work, @OmarManzoor, thank you!

sklearn/linear_model/_logistic.py

adrinjalali · 2022-11-03T09:37:35Z

sklearn/linear_model/_logistic.py

+        # Remove the sample_weight parameter from score_params as it is
+        # explicitly being passed in the path_func.
+        score_params = routed_params.scorer.score
+        score_params.pop("sample_weight", None)


those sample weights are used for fit, while the score params passed are used to score. The _log_reg_scoring_path with the current implementation doesn't take sample_weight into account for scoring since it doesn't pass it to the score method if sample_weight is passed. Either here it needs to not be popped, or there it needs to be handled explicitly.

The reason I popped it is because sample_weight is also explicitly being passed to the _log_reg_scoring_path method. If we pass sample_weight directly and then also pass it through the score_params it just seems to be duplicated.

adrinjalali · 2022-11-03T09:44:11Z

sklearn/linear_model/tests/test_logistic.py

@@ -1984,3 +1985,33 @@ def test_warning_on_penalty_string_none():
    )
    with pytest.warns(FutureWarning, match=warning_message):
        lr.fit(iris.data, target)
+
+
+def test_lr_cv_scorer_does_not_receive_sample_weight_when_score_request_is_not_set():


I quite like these two tests, but the tests you had in the common tests were also in the right direction. I don't think the tests here cover what we need to test:

In common tests we actually test if metadata is passed when it should, to the underlying objects, and here we don't do that. I think it'd be nice to have specific tests for LRCV in the common tests using the machinary there, even if they're not a part of the common tests as the other ones.

The problem with the ones in the common tests was that they were not actually working properly for scorers. The score method does not raise an exception in the inner implementation of MethodMetadataRequest, if we don't set the score request.

Can you please write a minimal test for it? It should raise, there shouldn't be a difference between fit and score in that regard.

Could you kindly provide an example of a scorer with LogisticRegressionCV that would raise UnsetMetadataPassedError when sample_weight is passed but the score request is not set?

generally it's not the consumer which raises the error, rather the router does that when process_routing is called. So the only thing we need to test is for the scorer to have the right request values.

I see so should I then use the code that I had earlier to record the metadata in the sample scorer and then check those values?

yes, pretty much, very similar to how we do it in consumer estimators there.

@adrinjalali The tests in CircleCI seem to be failing. I actually reverted the code to the point where it passed previously but it still fails. Could you kindly check?

Seems to have something to do with set_output (I might be wrong). I'm checking. cc @thomasjpfan

@adrinjalali The checks still seem to be failing but could you kindly check the test that I added in the common tests?

adrinjalali · 2022-11-08T11:25:26Z

Merging the latest sample-props branch should fix the CI issues: #24854

…ters

OmarManzoor · 2022-11-14T12:04:46Z

@adrinjalali Could you kindly have a look at the updates?

jnothman

Otherwise LGTM, thanks!

jnothman · 2022-12-28T07:55:23Z

sklearn/linear_model/tests/test_logistic.py

+    routed_params = process_routing(
+        obj=lr_cv, method="fit", sample_weight=sample_weight, other_params=other_params
+    )
+    assert routed_params.scorer.score


this is unnecessary given the next line

jnothman · 2022-12-28T08:01:37Z

sklearn/linear_model/tests/test_logistic.py

+    assert not routed_params.scorer.score
+
+
+def test_lr_cv_scorer_receives_sample_weight_when_score_request_is_set():


We don't actually test that the weights are received or have an effect. Perhaps we should check that the results are different from when sample_weight is not requested.

Thank you for the suggestion. I just wanted to mention that there seems to be an issue if I try to call fit on LogisticRegressionCV when requesting sample weight for the underlying scorer. The problem occurs at this point
https://github.com/scikit-learn/scikit-learn/pull/24498/files#diff-66422c7ed307653c80e5d68ed353d44d8fbb02319c428e140daff082eaf9d6cbR742. Over here the sample weight has a length of 10 whereas y_test has a length of 2 and this causes an inconsistent number of samples error.

Please make sure to index a subset of weights for the train/test set like we do for X and y... Not that I've checked the code now

@jnothman I updated the sample weight to only index the test set at the required place and also updated the test. Could you kindly have a look?

… and update the test

…gression_cv_routing

adrinjalali

Thanks for the work so far @OmarManzoor

adrinjalali · 2023-03-13T14:23:31Z

sklearn/linear_model/_logistic.py

+            if "sample_weight" in _score_params:
+                _score_params["sample_weight"] = _score_params["sample_weight"][test]


we should call _check_fit_params the same as done in BaseSearchCV, and index any metadata which might be sample-aligned.

adrinjalali · 2023-03-13T15:28:14Z

@@ -1751,6 +1760,9 @@ def fit(self, X, y, sample_weight=None):
            Array of weights that are assigned to individual samples.
            If not provided, then each sample is given unit weight.

+        **fit_params : dict
+            Parameters to pass to the underlying splitter and scorer.
+


the parameters for the public methods need a .. versionadded:: 2.0

…gression_cv_routing

adrinjalali · 2023-06-03T15:31:41Z

@OmarManzoor we have switched to a feature flag system (introduced in #26103), which makes things much easier for implementing slep6 for meta-estimators.

The big slep6 PR is now merged into main (#24027), and the sample-props branch is now removed from the main repo.

Would you be up for converting this PR to the feature flag enabled form, and submit the PR to main?

Also happy to document/answer your questions on how to do the feature flag thingy.

OmarManzoor · 2023-06-03T18:35:37Z

@OmarManzoor we have switched to a feature flag system (introduced in #26103), which makes things much easier for implementing slep6 for meta-estimators.

The big slep6 PR is now merged into main (#24027), and the sample-props branch is now removed from the main repo.

Would you be up for converting this PR to the feature flag enabled form, and submit the PR to main?

Also happy to document/answer your questions on how to do the feature flag thingy.

Thank you for the suggestion. Sure, I would like to try this out as soon as I find time.

SLEP006 - Add Metadata Routing to LogisticRegressionCV

2e14011

github-actions bot added the module:linear_model label Sep 22, 2022

jnothman reviewed Sep 23, 2022

View reviewed changes

jnothman mentioned this pull request Sep 23, 2022

SLEP006 - Metadata Routing task list #22893

Open

28 tasks

OmarManzoor added 2 commits September 30, 2022 17:10

Adjust sample weight to scorer in fit method

14d8045

Fix the docstring error

4fa9c66

adrinjalali reviewed Sep 30, 2022

View reviewed changes

OmarManzoor added 2 commits September 30, 2022 18:55

Send the extra scoring params to _log_reg_scoring_path

cf00fdf

Refactor

45428c8

adrinjalali added the No Changelog Needed label Oct 5, 2022

Attempt to add tests for scorers

70cc34f

Merge sample-props

027ca5f

Empty Commit

4837b62

adrinjalali reviewed Nov 3, 2022

View reviewed changes

OmarManzoor added 5 commits November 3, 2022 15:02

Fix doc for fit_params

d13ceb9

Correct the passing of score params to _log_reg_scoring_path

5672b21

Add a test to test if metadata is routed correctly to scorer

4aee09f

Inherit consuming scorer from base scorer

a98c55b

Revert consuming scorer to check if tests pass

ad9f9e9

OmarManzoor force-pushed the logistic_regression_cv_routing branch from fb27db8 to ad9f9e9 Compare November 4, 2022 07:47

OmarManzoor added 2 commits November 4, 2022 13:02

Merge branch 'sample-props' into logistic_regression_cv_routing

a382407

Add the ConsumingScorer and test back

1497d0b

OmarManzoor added 2 commits November 8, 2022 16:32

Merge branch 'sample-props' into logistic_regression_cv_routing

beda350

Add a consuming splitter and add a test to check routing for cv split…

594f0ac

…ters

OmarManzoor requested a review from adrinjalali November 14, 2022 12:05

jnothman approved these changes Dec 28, 2022

View reviewed changes

OmarManzoor added 5 commits January 2, 2023 12:17

Extract the test indices for sample weight when passing to the scorer…

70c5781

… and update the test

Fix test

a5b542a

Fix test

fc56dc8

Fix test

dd360de

Merge remote-tracking branch 'upstream/sample-props' into logistic_re…

c24b8f1

…gression_cv_routing

adrinjalali reviewed Mar 13, 2023

View reviewed changes

OmarManzoor added 2 commits March 14, 2023 19:36

Merge remote-tracking branch 'upstream/sample-props' into logistic_re…

bfd3969

…gression_cv_routing

PR suggestions

3ad8067

thomasjpfan mentioned this pull request Mar 24, 2023

Use sample_weight when validating LogisticRegressionCV #25906

Closed

thomasjpfan deleted the branch scikit-learn:sample-props June 2, 2023 20:43

thomasjpfan closed this Jun 2, 2023

OmarManzoor deleted the logistic_regression_cv_routing branch June 3, 2023 18:34

OmarManzoor mentioned this pull request Jun 7, 2023

ENH Add routing to LogisticRegressionCV #26525

Merged

		assert not routed_params.scorer.score


		def test_lr_cv_scorer_receives_sample_weight_when_score_request_is_set():

		if "sample_weight" in _score_params:
		_score_params["sample_weight"] = _score_params["sample_weight"][test]

Uh oh!

SLEP006 - Add Metadata Routing to LogisticRegressionCV #24498

SLEP006 - Add Metadata Routing to LogisticRegressionCV #24498

Uh oh!

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!