8000 What's the recommended approach for building a complete data frame (feature values + names) after using ColumTransformer/ FeatureUnion? · Issue #15755 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

What's the recommended approach for building a complete data frame (feature values + names) after using ColumTransformer/ FeatureUnion? #15755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
konradsemsch opened this issue Dec 2, 2019 · 8 comments

Comments

@konradsemsch
Copy link

I posted a question on StackOverflow but unfortunately without any reply. Could you guys please take a quick look at this end recommend something? I have a feeling there's more people looking for a similar, clear answer which I haven't been able to find anywhere so your input would be very much appreciated!

https://stackoverflow.com/questions/58989517/whats-the-recommended-approach-for-building-a-complete-data-frame-feature-valu

Best
Konrad

@NicolasHug
Copy link
Member

We don't have an API for that yet unfortunately.

Feature names propagation is something we are actively working on scikit-learn/enhancement_proposals#18, so hopefully this should be available in a near-ish future

@konradsemsch
Copy link
Author

Thank you for your reply. Could you recommended any particular approach for working with that with the current implementation of sklearn? Any resources you could point me to?

@amueller
Copy link
Member
amueller commented Dec 3, 2019

@konradsemsch The point is that this requires a change to all transformers if you want it to work reliably. So you have two choices: pick a branch that implements this, like #12627 or #14238, or monkey-patch the estimators that you are using and that are missing feature names.
Often it can be as simple as doing StandardScaler.get_feature_names = lambda x: x or something like that.

@jnothman
Copy link
Member
jnothman commented Dec 3, 2019 via email

@konradsemsch
Copy link
Author

Hi guys, thank you for your suggestions but this is kind of exactly what I meant in my original questions posted on Stack Overflow - there could be many different ways of doing it but what would be the most efficient, recommended approach of doing it in more complicated pipelines involving Feature Union/ Column Transformer? There's no clear answer to that anywhere on the net.

Would you mind responding to my original question with a tangible list of solutions that address that? I think many more people than only myself would benefit from that before propagating feature_names is properly implemented within sklearn natively.

@NicolasHug
Copy link
Member

what would be the most efficient, recommended approach of doing it

@konradsemsch , the answers so far (and lack thereof) indicate that there is no such recommended approach, yet.

@amueller
Copy link
Member
amueller commented Dec 3, 2019

@konradsemsch well you can check the pull requests that implement them. There is not really an easier way to do it.

@konradsemsch
Copy link
Author

Haha, ok - appreciate your honesty then! Will take a good look at the PRs and I'm looking forward to seeing eventually a native solution for that :) Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
0