[MRG] ENH Add get_feature_names for various transformers #6431

yenchenlin · 2016-02-23T14:34:46Z

This is a PR for #6425 .
I've added get_feature_names for scalers, normalizers and imputers.

Is the purpose of this function to maintain compatibility when get_feature_names() is implemented in Pipeline?

Can @jnothman please have a look?

jnothman · 2016-02-24T01:54:43Z

Would I be able to get you to merge together all these trivial cases - #6431, #6432, #6436 - into one PR? Sorry for not suggesting that in the first place. Whatever decisions we make about API and testing we'll need to make consistently.

yenchenlin · 2016-02-24T02:14:40Z

@jnothman Sure!

yenchenlin · 2016-02-24T05:22:24Z

Hello @jnothman and @nelson-liu ,
I've cherry-pick the commits from #6432 and #6436 .
I'll close #6432 once you guys confirm.

kmike · 2016-09-12T19:16:28Z

sklearn/preprocessing/_function_transformer.py

+
+        """
+        if input_features is not None:
+            input_features = list(map(lambda input_feature: 'f(' +


What's the reason function name is not used here (self.func.__name__)?

I don't consider it very stable...?

Right, there can be lambdas, etc, but that's still the best information available. It allow to e.g. get a more inspectable feature names if several FunctionTransformers are used in a FeatureUnion. Is this use case uncommon?

To clarify, repr is unstable, but func.__name__ is more deterministic:

>>> def foo(x): pass >>> repr(foo) '<function foo at 0x106683950>' >>> foo.__name__ 'foo'

We've got to balance a few things here:

a machine-readable description of a feature's provenance (ideally a structured representation)

a human readable (in limited memory!) description of a feature as distinct from a neighbour

backwards compatibility issues (although I am willing to label some of these things as experimental and subject to change)

Yeah, good points.

a machine-readable description of a feature's provenance (ideally a structured representation)

Are you saying that string formatting is not powerful enough for these use cases, and there is a need for more structured representation (a dict, a special object)? What additional information are you thinking of?

a human readable (in limited memory!) description of a feature as distinct from a neighbour

In eli5 library instead of feature name lists we used a special object which generates feature names on a fly if they are autogenerated ('pca__x0' as opposed to 'petal width'), in order to avoid keeping it all in memory, and to save CPU time when only a subset of feature names is needed. For example, a common task is to get feature names which correspond to non-zero coefficients in a particular document, or feature names for top N features with largest coefficients. CPU was even more important than memory because there were noticeable delays in interactive usage.

jnothman · 2016-11-16T09:11:27Z

Yes, I can see the appeal in that approach, but still don't have a clear
resolution here.

I can also see that __ might be an appropriate way to designate function
application. Hmm.

On 16 November 2016 at 19:01, Mikhail Korobov notifications@github.com
wrote:

@kmike commented on this pull request.

In sklearn/preprocessing/_function_transformer.py
#6431:
   """
   Return feature names for output features
   Parameters
   input_features : list of string, length len(input_features), optional
       String names for input features if available. By default,
       None is used.
   Returns
   output_feature_names : list of string, length len(input_features)
   """
   if input_features is not None:
       input_features = list(map(lambda input_feature: 'f(' +
Yeah, good points.

a machine-readable description of a feature's provenance (ideally a
structured representation)

Are you saying that string formatting is not powerful enough for these use
cases, and there is a need for more structured representation (a dict, a
special object)? What additional information are you thinking of?

a human readable (in limited memory!) description of a feature as distinct
from a neighbour

In eli5 https://github.com/TeamHG-Memex/eli5 library instead of feature
name lists we used a special object which generates feature names on a fly
if they are autogenerated ('pca__x0' as opposed to 'petal width'), in order
to avoid keeping it all in memory, and to save CPU time when only a subset
of feature names is needed. For example, a common task is to get feature
names which correspond to non-zero coefficients in a particular document,
or feature names for top N features with largest coefficients. CPU was even
more important than memory because there were noticeable delays in
interactive usage.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#6431, or mute the
thread
https://github.com/notifications/unsubscribe-auth/AAEz609Q5J2Y0_N728GGBSZ5qnRwQABCks5q-rhagaJpZM4Hgp7U
.

jnothman · 2017-01-18T03:00:14Z

sklearn/preprocessing/_function_transformer.py

+        output_feature_names : list of string, length len(input_features)
+
+        """
+        if input_features is not None:


I think this logic is incorrect. func is not applied to each feature independently, but to X as a whole. The only way to get sensible feature names out of a FunctionTransformer is to have another parameter specify get_feature_names, essentially, or else to count the number of columns output from transform and make up default names on that basis.

Do you think a string template (passed to FunctionTransformer constructor?) is enough?

No, something like FunctionTransformer(lambda X_df: X_df['col_a', 'col_b']) should have feature names ['col_a', 'col_b'].

lucasdavid · 2017-12-12T21:54:22Z

sklearn/preprocessing/imputation.py

+        output_feature_names : list of string, length len(input_features)
+
+        """
+        return input_features


What if some of the features had only nan in them? Shouldn't this method clip them from input_features, as Imputer does to the column itself?

thomasjpfan · 2022-02-02T05:06:56Z

Closing because we have moved forward with SLEP007 and transformers will adopt get_feature_names_out.

Tracking issue: #21308

Add get_feature_names for normalizers, scalers and imputers

4497774

jnothman mentioned this pull request Feb 23, 2016

feature: add get_feature_names() and tests to FunctionTransformer #6436

Closed

yenchenlin and others added 2 commits February 24, 2016 13:12

Add get_feature_names for Binarizer

10000

b8e6ee5

feature: add get_feature_names() and tests to FunctionTransformer

bb838d7

yenchenlin changed the title ~~[MRG] ENH Add get_feature_names for normalizers, scalers and imputers~~ [MRG] ENH Add get_feature_names for various transformers Feb 24, 2016

yenchenlin mentioned this pull request Feb 24, 2016

[MRG] ENH Add get_feature_names for Binarizer #6432

Closed

jnothman mentioned this pull request Feb 24, 2016

Transformative get_feature_names for various transformers #6425

Closed

11 tasks

kmike reviewed Sep 12, 2016
View reviewed changes

jnothman reviewed Jan 18, 2017

View reviewed changes

kmike mentioned this pull request Jul 31, 2017

transform_feature_names for scalers TeamHG-Memex/eli5#229

Merged

lucasdavid reviewed Dec 12, 2017

View reviewed changes

amueller added the Superseded PR has been replace by a newer PR label Aug 5, 2019

github-actions bot added the module:preprocessing label Mar 2, 2020

Base automatically changed from master to main January 22, 2021 10:49

thomasjpfan closed this Feb 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] ENH Add get_feature_names for various transformers #6431

[MRG] ENH Add get_feature_names for various transformers #6431

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

@kmike commented on this pull request.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[MRG] ENH Add get_feature_names for various transformers #6431

[MRG] ENH Add get_feature_names for various transformers #6431

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

@kmike commented on this pull request.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!