8000 UnaryEncoder · Issue #8628 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

UnaryEncoder #8628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jnothman opened this issue Mar 22, 2017 · 14 comments
Open

UnaryEncoder #8628

jnothman opened this issue Mar 22, 2017 · 14 comments
Labels
Moderate Anything that requires some knowledge of conventions and best practices module:preprocessing New Feature

Comments

@jnothman
Copy link
Member

I'm sure we've discussed this before, but I'm not sure where, and there certainly does not appear to be an active PR. For ordinal (and discretized; see #7668) features, a "unary" encoding (is there a better name for this) is more informative than a one-hot encoding. For k values 0, ..., k - 1 of the ordinal feature x, this creates k - 1 binary features such that the ith is active if x > i (for i = 0, ... k - 1). Below is an initial implementation.

class UnaryEncoder(BaseEstimator, TransformerMixin):
    def __init__(self, n_values):
        self.n_values = n_values

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        values = np.arange(self.n_values - 1)
        X = check_array(X)
        Xt = np.hstack([values < X[:, i, None] for i in range(X.shape[1])])
        return Xt
>>> UnaryEncoder(3).fit_transform([[0], [1], [2]])
array([[0, 0],
       [ 1, 0],
       [ 1,  1]])
@jnothman jnothman added Moderate Anything that requires some knowledge of conventions and best practices Need Contributor New Feature labels Mar 22, 2017
@jmschrei
Copy link
Member

So the major difference is that instead of a single bit being active, all bits up to i are active? I've certainly used this before, I did not know it was called a unary encoder!

@jnothman
Copy link
Member Author
jnothman commented Mar 22, 2017 via email

@jmschrei
Copy link
Member

Yes, I agree that it would be nice to have included. I just colloquially referred to it as a cumulative bit encoder, but that seems long.

@jnothman
Copy link
Member Author
jnothman commented Mar 22, 2017 via email

@arjunjauhari
Copy link

How about StepEncoder as it sounds very much like unit step function. Also, if someone is not already working on it, i would like to implement it.

@jmschrei
Copy link
Member

Go ahead. We can figure out a name later.

@jnothman
Copy link
Member Author
jnothman commented Mar 23, 2017 via email

@jnothman
Copy link
Member Author
jnothman commented Mar 23, 2017 via email

@arjunjauhari
Copy link

Going ahead with OrdinalEncoder for now.

@avannaldas
Copy link

May be "LeftHotEncoder" and "RightHotEncoder", if the latter scenario makes sense? Is this taken up by the way?

@avannaldas
Copy link

Apologies, I guess it's taken up and WIP.. didn't notice the PR reference.. am on a mobile device.

@raghavrv
Copy link
Member

This has an active PR at #8652

@cmarmo
Copy link
Contributor
cmarmo commented Dec 16, 2021

More recent work in #12893.

@glevv
Copy link
Contributor
glevv commented Nov 3, 2022

It's not very clear how to implement Unary encoder (also called Thermometer Encoder or RankHot Encoder) as categorical encoder, since it is designed for ordered values, not just any category.

But it should be ok to add it as an option to KBinsDiscretizer, or not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Moderate Anything that requires some knowledge of conventions and best practices module:preprocessing New Feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants
0