8000 [MRG] Keep order of variables in LabelEncoder by samuelduchesne · Pull Request #1055 · scikit-optimize/scikit-optimize · GitHub
[go: up one dir, main page]

Skip to content
This repository was archived by the owner on Feb 28, 2024. It is now read-only.

[MRG] Keep order of variables in LabelEncoder #1055

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

samuelduchesne
Copy link

The problem

The current behavior of the LabelEncoder is to sort the variables when the mapping is performed. This happens because of the use of np.unique which returns a sorted array of unique values: See https://github.com/Elementa-Engineering/scikit-optimize/blob/master/skopt/space/transformers.py#L175-L177 and https://numpy.org/doc/stable/reference/generated/numpy.unique.html

For example:

>>> from skopt.space.space import Categorical

>>> c = Categorical(("c", "b", "a"), transform="label")
>>> c.transform(["a", "b", "c"])
[0, 1, 2]

Note that the returned labels are 0, 1 and 2 (equivalent to ("a", "b", "c") even if the specified order was ("c", "b", "a")). This can be counter-intuitive, especially when the order of the variable "means" something for the user.

Implemented Fix

This PR, implements a simple fix, which retains the order of the categorical dimensions. The expected behavior then becomes:

>>> from skopt.space.space import Categorical

>>> c = Categorical(("c", "b", "a"), transform="label")
>>> c.transform(["a", "b", "c"])
[2, 1, 0]

The order is conserved.

Same goes for numerical numbers:

from skopt.space.space import Categorical

c = Categorical((10, 30, 20), transform="label")
c.transform([10, 20, 30])
[0, 2, 1]

@samuelduchesne samuelduchesne changed the title Keep order of variables in LabelEncoder [MRG] Keep order of variables in LabelEncoder Sep 14, 2021
@samuelduchesne
Copy link
Author

@kernc, not sure why CI didn't fire up here, but this is ready for a review. :)

@QuentinSoubeyran
Copy link
Contributor

Try to push a commit again to trigger the CI, perhaps ?

@samuelduchesne
Copy link
Author

Try to push a commit again to trigger the CI, perhaps ?

Still not working! Weird!

@QuentinSoubeyran
Copy link
Contributor

Well, I'm at a loss here... Have you run the tests locally using pytest ? Maybe the CI is straigh crashing on this PR, hence no info ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0