Open
Description
Doing constant imputation treats only the "missing_value" as missing, so a None
by default stays there:
from sklearn.impute import SimpleImputer
import numpy as np
X = np.array([1, 2, np.NaN, None]).reshape(-1, 1)
SimpleImputer(strategy='constant', fill_value="asdf").fit_transform()
array([[1],
[2],
['asdf'],
[None]], dtype=object)
However, using strategy='mean' coerces the None to NaN and so both are replaced:
SimpleImputer(strategy='mean').fit_transform(X)
array([[1. ],
[2. ],
[1.5],
[1.5]])
I don't think the definition of what's missing should depend on the strategy. @thomasjpfan argues that the current constant behavior is inconvenient because it means you have to impute both values separately if you want to one-hot-encode.
It seems more safe to treat them differently but I'm not sure there's a use-case for that. This came up in #17317.
I think this only matters in these two, as other imputers don't allow dtype object arrays.
Metadata
Metadata
Assignees
Type
Projects
Status
No status