-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
ENH add feature_names_in_ in FeatureUnion #25220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH add feature_names_in_ in FeatureUnion #25220
Conversation
- added self._check_feature_names(...) to the .fit(...) method in FeatureUnion to allow access to the `.feature_names_in_` attribute if `X` has features names, e.g. a pandas.DataFrame - updated FeatureUnion docstring to reflect the addition of .feature_names_in_ attribute modified: sklearn/tests/test_pipeline.py - added test_feature_union_feature_names_in_() to test that FeatureUnion has a `.feature_names_in_` attribute if fitted with a pandas.DataFrame and not if fitted with a numpy array
- changelog updated with description of work
- made changelog description more precise
- typo -- removed period (.) before `columns`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
- removed `self._check_feature_names(...) from `.fit(...)` method in `FeatureUnion` - added `feature_names_in_()` property to `FeatureUnion` to use first transformer's `feature_names_in_` attribute if present modified: sklearn/tests/test_pipeline.py - updated docstring for `test_feature_union_feature_names_in_()` to be more precise - added additional assertions to check if the `feature_names_in_` attribute is available to `FeatureUnion` if it's instantiated with a transformer that has already been fit
- updated changelog description to include `pandas.DataFrame` - corrected user signature to match github account
- added pandas import to `test_feature_union_feature_names_in_` so ImportError in azure-pipelines will pass
…it176131/scikit-learn into feature_union_feature_names_in_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update!
newline/whitespace between change log updates. Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
added period at end of docstring Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
- removed train-test-split per code suggestion -- using `X` directly
|
@thomasjpfan -- I've made the changes you suggested. Ready for another review! 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @it176131
Reference Issues/PRs
ENHANCEMENT #24754
What does this implement/fix? Explain your changes.
The
FeatureUnionclass did not previously have the.feature_names_in_attribute if fitted with apandas.DataFrame. This allows access to the attribute.Any other comments?
modified: sklearn/pipeline.py
self._check_feature_names(...)to the.fit(...)method inFeatureUnionto allow access to the.feature_names_in_attribute ifXhas features names, e.g. apandas.DataFrameFeatureUniondocstring to reflect the addition of.feature_names_in_attributemodified: sklearn/tests/test_pipeline.py
test_feature_union_feature_names_in_()to test thatFeatureUnionhas a.feature_names_in_attribute if fitted with apandas.DataFrameand not if fitted with anumpyarray