What happend to the idea of adding a 'handle_missing' parameter to the OneHotEncoder?

Discussed in #26531

^{Originally posted by woodly0 June 7, 2023}
Hello,
I'm having trouble understanding what finally happened to the idea of introducing a handle_missing parameter for the OneHotEncoder. My current project could still benefit from such an implementation.
There are many existing issues regarding this topic, however, I cannot deduct what was finally decided/implemented and what wasn't.

Considering the following features:

import pandas as pd

test_df = pd.DataFrame(
    {"col1": ["red", "blue", "blue"], "col2": ["car", None, "plane"]}
)

when using the encoder:

from sklearn.preprocessing import OneHotEncoder

ohe = OneHotEncoder(
    handle_unknown="ignore",
    sparse_output=False,
    #handle_missing="ignore"
)
ohe.fit_transform(test_df)

I get the output:

array([[0., 1., 1., 0., 0.],
       [1., 0., 0., 0., 1.],
       [1., 0., 0., 1., 0.]])

but what I'm actually looking for is to remove the None, i.e. not create a new feature but set all the others to zero:

array([[0., 1., 1., 0.],
       [1., 0., 0., 0.],
       [1., 0., 0., 1.]])

Is there a way to achieve this without using another transformer object?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Discussed in #26531

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Discussed in #26531

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions