8000 MNT SLEP6 remove args that shouldn't be included in the routing by adrinjalali · Pull Request #29920 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

MNT SLEP6 remove args that shouldn't be included in the routing #29920

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Oct 29, 2024

Conversation

adrinjalali
Copy link
Member

Fixes #26505

This cleans up arguments which [probably] shouldn't be included in the routing. Some of them we might want to rename/deprecate (happy to do so in the same PR if that's how we want to proceed).

After this cleanup, the following script gives this output, which I think is okay(?)

from sklearn.utils import all_estimators
from sklearn.base import MetaEstimatorMixin
import inspect

for name, Cls in all_estimators():
    is_meta = issubclass(Cls, MetaEstimatorMixin)
    set_methods = [
        name
        for name, _ in inspect.getmembers(Cls, inspect.isroutine)
        if name.startswith("set_") and name.endswith("request")
    ]

    SKIP_ARGS = ["sample_weight", "classes", "return_std", "return_cov", "copy"]

    # get input arguments of the methods in set_methods
    args = {
        name: inspect.getfullargspec(getattr(Cls, name)).kwonlyargs
        for name in set_methods
    }
    for key, value in args.items():
        value = [arg for arg in value if arg not in SKIP_ARGS]
        if value:
            if is_meta:
                print(f"{name} is a meta-estimator")
            else:
                print(name)
            print(f"  {key}: {value}")
            print()
GradientBoostingClassifier is a meta-estimator
  set_fit_request: ['monitor']

GradientBoostingRegressor is a meta-estimator
  set_fit_request: ['monitor']

LabelBinarizer
  set_inverse_transform_request: ['threshold']

Lars
  set_fit_request: ['Xy']

LarsCV
  set_fit_request: ['Xy']

LassoLars
  set_fit_request: ['Xy']

LassoLarsCV
  set_fit_request: ['Xy']

LassoLarsIC
  set_fit_request: ['copy_X']

MDS
  set_fit_request: ['init']

PassiveAggressiveClassifier
  set_fit_request: ['coef_init', 'intercept_init']

PassiveAggressiveRegressor
  set_fit_request: ['coef_init', 'intercept_init']

Perceptron
  set_fit_request: ['coef_init', 'intercept_init']

SGDClassifier
  set_fit_request: ['coef_init', 'intercept_init']

SGDOneClassSVM
  set_fit_request: ['coef_init', 'offset_init']

SGDRegressor
  set_fit_request: ['coef_init', 'intercept_init']

cc @OmarManzoor @adam2392 @glemaitre

Copy link
github-actions bot commented Sep 23, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 0ceec41. Link to the linter CI: here

Copy link
Member
@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts:

It seems this requires every class to define something along the lines of:

__metadata_request__score = {"Something": metadata_routing.UNUSED}

Where something can be like check_input, X_test, …. It seems somewhat tedious, and we could easily miss these lines of code in the future. But I also don't see another way around it.

Is there an easy way to test this for future estimators?

Can we just reject certain keywords within metadata? But I guess this would restrict users as well.

@adrinjalali
Copy link
Member Author

Yeah we certainly don't want to have an "allowed list" kinda thing. Looking at the estimators out there like LightGBM, they have a wild range of things they pass to fit.

For new estimators, or whenever we change something, if we have something which by mistake ends up in the metadata routing machinery, it would create a corresponding set_{method}_request(param=...) kinda thing for it, which we should notice by looking at the rendered API page.

Also, looking at the things removed in this PR, other than check_input, they all have been there for a very long time. So the frequency at which we add things to the routing which shouldn't be there, is quite rare.

@glemaitre glemaitre self-requested a review October 3, 2024 10:25
@adrinjalali adrinjalali added this to the 1.6 milestone Oct 15, 2024
Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
@adrinjalali
Copy link
Member Author

@adam2392 does this look good to you to merge?

Comment on lines +1123 to +1124
# raw_documents should not be in the routing mechanism. It should have been
# called X in the first place.
Copy link
Member
@adam2392 adam2392 Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be handled here, but I wonder does this warrant us renaming this to X to be consistent across the codebase, or low-priority?

Same for the other regions you mentioned.

Copy link
Member
@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @adrinjalali!

@adam2392
Copy link
Member

I guess I will do the green button :)

@adam2392 adam2392 merged commit e617d82 into scikit-learn:main Oct 29, 2024
30 checks passed
@adrinjalali adrinjalali deleted the slep6/arg-cleanup branch October 29, 2024 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SLEP006: make sure generated set_{method}_request methods are all legit
3 participants
0