8000 Defines missing base.is_transformer API by thomasjpfan · Pull Request #30368 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Defines missing base.is_transformer API #30368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Related to #29677

What does this implement/fix? Explain your changes.

This is_transformer is referenced in the api_reference but never defined:

"is_transformer",

This PR defines it.

Any other comments?

I do not think we had a _estimator_type="transformer", so there is no need to check for it.

@thomasjpfan thomasjpfan changed the title Adds base.is_transformer Defines missing base.is_transformer API Nov 28, 2024
@thomasjpfan thomasjpfan added this to the 1.6.1 milestone Nov 28, 2024
@thomasjpfan thomasjpfan added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Nov 28, 2024
@thomasjpfan
Copy link
Member Author

CC @adrinjalali @glemaitre I'm putting this on the 1.6.1 milestone for now, but I'll be okay to have it in 1.6.

Copy link

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 44a5bcc. Link to the linter CI: here

@glemaitre
Copy link
Member

So in one of the previous PR, I could detect that some vectorizer in scikit-learn implementing some fit_transform would not be inherit from TransformerMixin and thus would not be affected to a transformer, e.g. TfidfVectorizer.

Should we change those considering that since they have the right method, they should be considered as transformer?

Copy link
Member
@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the mention of is_transformer instead, or have it check for the presence of the methods instead of relying on the tags.

Alternatively, the tag can be a list of extimator types.

Another reason why I haven't thought of doing a is_transformer is that I'm not sure what is a transformer. We have estimators which only implement a fit_transform and no transform, are they a transformer? Not really, but they kinda are(?)

8000
@@ -857,6 +857,7 @@ class TransformerMixin(_SetOutputMixin):

def __sklearn_tags__(self):
tags = super().__sklearn_tags__()
tags.estimator_type = "transformer"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is gonna break many things, since we have a bunch of estimators that are transformers, and another type.

I had noticed this before, and I think it makes sense to only check for presence of transform and/or fit_transform to see if something is a transformer.

@thomasjpfan
Copy link
Member Author
thomasjpfan commented Nov 29, 2024

Alternatively, the tag can be a list of extimator types.

If an estimator can have multiple types, then it feels better to have an enum like InputTags.

Closing in favor of #30374, which removes the reference to is_transformer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
To backport PR merged in master that need a backport to a release branch defined based on the milestone.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0