Open
Description
Discussed in #26531
Originally posted by woodly0 June 7, 2023
Hello,
I'm having trouble understanding what finally happened to the idea of introducing a handle_missing
parameter for the OneHotEncoder
. My current project could still benefit from such an implementation.
There are many existing issues regarding this topic, however, I cannot deduct what was finally decided/implemented and what wasn't.
- Handle missing values in OneHotEncoder #11996
- ENH: handle missing values in OneHotEncoder #12025
- ENH Adds missing value support to OneHotEncoder #17317
- Include drop='last' to OneHotEncoder #23436
Considering the following features:
import pandas as pd
test_df = pd.DataFrame(
{"col1": ["red", "blue", "blue"], "col2": ["car", None, "plane"]}
)
when using the encoder:
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder(
handle_unknown="ignore",
sparse_output=False,
#handle_missing="ignore"
)
ohe.fit_transform(test_df)
I get the output:
array([[0., 1., 1., 0., 0.],
[1., 0., 0., 0., 1.],
[1., 0., 0., 1., 0.]])
but what I'm actually looking for is to remove the None
, i.e. not create a new feature but set all the others to zero:
array([[0., 1., 1., 0.],
[1., 0., 0., 0.],
[1., 0., 0., 1.]])
Is there a way to achieve this without using another transformer object?