MNT SLEP6 remove args that shouldn't be included in the routing #29920

adrinjalali · 2024-09-23T16:55:30Z

This cleans up arguments which [probably] shouldn't be included in the routing. Some of them we might want to rename/deprecate (happy to do so in the same PR if that's how we want to proceed).

After this cleanup, the following script gives this output, which I think is okay(?)

from sklearn.utils import all_estimators
from sklearn.base import MetaEstimatorMixin
import inspect

for name, Cls in all_estimators():
    is_meta = issubclass(Cls, MetaEstimatorMixin)
    set_methods = [
        name
        for name, _ in inspect.getmembers(Cls, inspect.isroutine)
        if name.startswith("set_") and name.endswith("request")
    ]

    SKIP_ARGS = ["sample_weight", "classes", "return_std", "return_cov", "copy"]

    # get input arguments of the methods in set_methods
    args = {
        name: inspect.getfullargspec(getattr(Cls, name)).kwonlyargs
        for name in set_methods
    }
    for key, value in args.items():
        value = [arg for arg in value if arg not in SKIP_ARGS]
        if value:
            if is_meta:
                print(f"{name} is a meta-estimator")
            else:
                print(name)
            print(f"  {key}: {value}")
            print()

GradientBoostingClassifier is a meta-estimator
  set_fit_request: ['monitor']

GradientBoostingRegressor is a meta-estimator
  set_fit_request: ['monitor']

LabelBinarizer
  set_inverse_transform_request: ['threshold']

Lars
  set_fit_request: ['Xy']

LarsCV
  set_fit_request: ['Xy']

LassoLars
  set_fit_request: ['Xy']

LassoLarsCV
  set_fit_request: ['Xy']

LassoLarsIC
  set_fit_request: ['copy_X']

MDS
  set_fit_request: ['init']

PassiveAggressiveClassifier
  set_fit_request: ['coef_init', 'intercept_init']

PassiveAggressiveRegressor
  set_fit_request: ['coef_init', 'intercept_init']

Perceptron
  set_fit_request: ['coef_init', 'intercept_init']

SGDClassifier
  set_fit_request: ['coef_init', 'intercept_init']

SGDOneClassSVM
  set_fit_request: ['coef_init', 'offset_init']

SGDRegressor
  set_fit_request: ['coef_init', 'intercept_init']

cc @OmarManzoor @adam2392 @glemaitre

github-actions · 2024-09-23T16:56:50Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 0ceec41. Link to the linter CI: here}

adam2392

A few thoughts:

It seems this requires every class to define something along the lines of:

__metadata_request__score = {"Something": metadata_routing.UNUSED}

Where something can be like check_input, X_test, …. It seems somewhat tedious, and we could easily miss these lines of code in the future. But I also don't see another way around it.

Is there an easy way to test this for future estimators?

Can we just reject certain keywords within metadata? But I guess this would restrict users as well.

adrinjalali · 2024-09-27T14:58:04Z

Yeah we certainly don't want to have an "allowed list" kinda thing. Looking at the estimators out there like LightGBM, they have a wild range of things they pass to fit.

For new estimators, or whenever we change something, if we have something which by mistake ends up in the metadata routing machinery, it would create a corresponding set_{method}_request(param=...) kinda thing for it, which we should notice by looking at the rendered API page.

Also, looking at the things removed in this PR, other than check_input, they all have been there for a very long time. So the frequency at which we add things to the routing which shouldn't be there, is quite rare.

doc/whats_new/upcoming_changes/metadata-routing/29920.fix.rst

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

adrinjalali · 2024-10-29T11:09:13Z

@adam2392 does this look good to you to merge?

adam2392 · 2024-10-29T13:43:23Z

sklearn/feature_extraction/text.py

+    # raw_documents should not be in the routing mechanism. It should have been
+    # called X in the first place.


Doesn't need to be handled here, but I wonder does this warrant us renaming this to X to be consistent across the codebase, or low-priority?

Same for the other regions you mentioned.

adam2392

LGTM. Thanks @adrinjalali!

adam2392 · 2024-10-29T13:44:08Z

I guess I will do the green button :)

MNT remove args that shouldn't be included in the routing

ca17672

adrinjalali added 3 commits September 23, 2024 19:01

changelog

6bedcea

remove unneeded line

dff77ee

Merge remote-tracking branch 'upstream/main' into slep6/arg-cleanup

8fc4020

adam2392 reviewed Sep 27, 2024

View reviewed changes

glemaitre self-requested a review October 3, 2024 10:25

adrinjalali added this to the 1.6 milestone Oct 15, 2024

Merge remote-tracking branch 'upstream/main' into slep6/arg-cleanup

9b4c72d

glemaitre reviewed Oct 28, 2024

View reviewed changes

doc/whats_new/upcoming_changes/metadata-routing/29920.fix.rst Outdated Show resolved Hide resolved

glemaitre approved these changes Oct 28, 2024

View reviewed changes

Update doc/whats_new/upcoming_changes/metadata-routing/29920.fix.rst

0ceec41

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

adam2392 reviewed Oct 29, 2024 8000

View reviewed changes

adam2392 approved these changes Oct 29, 2024

View reviewed changes

adam2392 merged commit e617d82 into scikit-learn:main Oct 29, 2024
30 checks passed

adrinjalali deleted the slep6/arg-cleanup branch October 29, 2024 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MNT SLEP6 remove args that shouldn't be included in the routing #29920

MNT SLEP6 remove args that shouldn't be included in the routing #29920

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		# raw_documents should not be in the routing mechanism. It should have been
		# called X in the first place.

Uh oh!

MNT SLEP6 remove args that shouldn't be included in the routing #29920

MNT SLEP6 remove args that shouldn't be included in the routing #29920

Uh oh!

Conversation

Uh oh!

Uh oh!

✔️ Linting Passed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!