-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
What's the recommended approach for building a complete data frame (feature values + names) after using ColumTransformer/ FeatureUnion? #15755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We don't have an API for that yet unfortunately. Feature names propagation is something we are actively working on scikit-learn/enhancement_proposals#18, so hopefully this should be available in a near-ish future |
Thank you for your reply. Could you recommended any particular approach for working with that with the current implementation of sklearn? Any resources you could point me to? |
@konradsemsch The point is that this requires a change to all transformers if you want it to work reliably. So you have two choices: pick a branch that implements this, like #12627 or #14238, or monkey-patch the estimators that you are using and that are missing feature names. |
You can also inherit from ColumnTransformer to define transform as return
DataFrame(super().transform(X), columns=self.get_feature_names()) to finish
the job... But yes there are other pieces needed to do this well.
|
Hi guys, thank you for your suggestions but this is kind of exactly what I meant in my original questions posted on Stack Overflow - there could be many different ways of doing it but what would be the most efficient, recommended approach of doing it in more complicated pipelines involving Feature Union/ Column Transformer? There's no clear answer to that anywhere on the net. Would you mind responding to my original question with a tangible list of solutions that address that? I think many more people than only myself would benefit from that before propagating feature_names is properly implemented within |
@konradsemsch , the answers so far (and lack thereof) indicate that there is no such recommended approach, yet. |
@konradsemsch well you can check the pull requests that implement them. There is not really an easier way to do it. |
Haha, ok - appreciate your honesty then! Will take a good look at the PRs and I'm looking forward to seeing eventually a native solution for that :) Thanks! |
I posted a question on StackOverflow but unfortunately without any reply. Could you guys please take a quick look at this end recommend something? I have a feeling there's more people looking for a similar, clear answer which I haven't been able to find anywhere so your input would be very much appreciated!
https://stackoverflow.com/questions/58989517/whats-the-recommended-approach-for-building-a-complete-data-frame-feature-valu
Best
Konrad
The text was updated successfully, but these errors were encountered: