8000 Implement `get_feature_names_out` for all estimators · Issue #21308 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Implement get_feature_names_out for all estimators #21308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
14 tasks done
adrinjalali opened this issue Oct 12, 2021 · 20 comments
Closed
14 tasks done

Implement get_feature_names_out for all estimators #21308

adrinjalali opened this issue Oct 12, 2021 · 20 comments
Labels
Enhancement High Priority High priority issues and pull requests
Milestone

Comments

@adrinjalali
Copy link
Member
adrinjalali commented Oct 12, 2021

This issue tracks the implementation of get_feature_names_out for all remaining estimators (follow-up work for #18444).

@thomasjpfan do you maybe have a list we could add here to track them?

Edit: let's do it by modules since the common test is organized this way:

Note: some notes on how to generate feature names in general:

https://github.com/scikit-learn/enhancement_proposals/blob/master/slep007/proposal.rst#output-feature-names

@adrinjalali
Copy link
Member Author

#21079 implements it for preprocessing module.

@ogrisel
Copy link
Member
ogrisel commented Oct 13, 2021

I updated description to make it work module by module.

@ogrisel ogrisel added the High Priority High priority issues and pull requests label Oct 13, 2021
@ogrisel
Copy link
Member
ogrisel commented Oct 13, 2021

@adrinjalali the SLEP007 need an update since we finally went for the get_feature_names_out method instead of a feature_names_out_ attribute.

@adrinjalali
Copy link
Member Author

@ogrisel here's the PR to update the SLEP: scikit-learn/enhancement_proposals#57

@MaxwellLZH
Copy link
Contributor

working on ensemble module

@ageron
Copy link
Contributor
ageron commented Oct 26, 2021

Shouldn't TransformerMixin provide a default implementation of get_feature_names_out()? The default behavior I expect is to return the names argument if is it not None, or else return self.feature_names_in_ if it exists, or else return [f"x{i}" for i in range(self.n_features_in_)].

This way, all I need to do is extend TransformerMixin and call _check_n_features(X, True) in the fit() method so it records self.n_features_in_.

Side note: these default names are zero-indexed, but when there are name conflicts make_pipeline() and make_column_transformer() are one-indexed. Not a big deal, but perhaps it's still time to make this consistent?

@thomasjpfan
Copy link
Member

Shouldn't TransformerMixin provide a default implementation of get_feature_names_out()?

My concern is for third party estimators that inherit TransformerMixin and are not "one-to-one". For example, PCA may have output feature names: [pca0, pca1, ...] where the number of features out is usually different from the number of features in. In this case, PCA needs to define it's own get_feature_names_out. For third party estimators that inherit TransformerMixin, I prefer not to automatically define get_feature_names_out that may not be suited for their estimator.

@ageron
Copy link
Contributor
ageron commented Oct 27, 2021

Thanks for your answer @thomasjpfan . I see your point, it would have to be opt-in to avoid breaking existing stuff. So could it be another mixin class, like FeatureNamesMixin? Or perhaps with a get_default_feature_names_out() function?

@thomasjpfan
Copy link
Member

There is a private _OneToOneFeatureMixin used by transformers with a simple one-to-one correspondence between input and output features:

class _OneToOneFeatureMixin:

@ageron
Copy link
Contributor
ageron commented Nov 1, 2021

Thanks @thomasjpfan , this looks like a good solution. Why not make it public?

@adrinjalali
Copy link
Member Author

I guess we're just figuring out what the public API should look like, once the API is stable, we can make them public.

@MrinalTyagi
Copy link
Contributor

Would like to work on neighbors and cluster

@thomasjpfan
Copy link
Member
thomasjpfan commented Dec 10, 2021

@MrinalTyagi I recommend working on one at a time to make the PR easier to review and faster to merge.

@MrinalTyagi
Copy link
Contributor

Sure, then ill start working on cluster

@Micky774
Copy link
Contributor

I'll get started on neighbors

@Micky774
Copy link
Contributor
Micky774 commented Jan 14, 2022

If anyone is interested, I'm looking for some discussion/feedback on my WIP pull request. Thanks.

@thomasjpfan
Copy link
Member

@MrinalTyagi Are you working on cluster? If not, we can open it up for other contributors.

@MrinalTyagi
Copy link
Contributor

@MrinalTyagi Are you working on cluster? If not, we can open it up for other contributors.

Sorry for not updating. Got busy in something else. Would love to see someone else initiate to complete the task.

@lorentzenchr
Copy link
Member

5 to go ... we are getting close.

@jeremiedbb
Copy link
Member

Every item is checked. Closing. Thanks everyone !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement High Priority High priority issues and pull requests
Projects
None yet
Development

No branches or pull requests

9 participants
0