8000 What happend to the idea of adding a 'handle_missing' parameter to the OneHotEncoder? · Issue #26543 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
What happend to the idea of adding a 'handle_missing' parameter to the OneHotEncoder? #26543
Open
@woodly0

Description

@woodly0

Discussed in #26531

Originally posted by woodly0 June 7, 2023
Hello,
I'm having trouble understanding what finally happened to the idea of introducing a handle_missing parameter for the OneHotEncoder. My current project could still benefit from such an implementation.
There are many existing issues regarding this topic, however, I cannot deduct what was finally decided/implemented and what wasn't.

Considering the following features:

import pandas as pd

test_df = pd.DataFrame(
    {"col1": ["red", "blue", "blue"], "col2": ["car", None, "plane"]}
)

when using the encoder:

from sklearn.preprocessing import OneHotEncoder

ohe = OneHotEncoder(
    handle_unknown="ignore",
    sparse_output=False,
    #handle_missing="ignore"
)
ohe.fit_transform(test_df)

I get the output:

array([[0., 1., 1., 0., 0.],
       [1., 0., 0., 0., 1.],
       [1., 0., 0., 1., 0.]])

but what I'm actually looking for is to remove the None, i.e. not create a new feature but set all the others to zero:

array([[0., 1., 1., 0.],
       [1., 0., 0., 0.],
       [1., 0., 0., 1.]])

Is there a way to achieve this without using another transformer object?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0