Closed
Description
Describe the bug
- List of strings raise ValueError in SimpleImputer with strategy='most_frequent' (also 'constant')
Steps/Code to Reproduce
import numpy as np
from sklearn.impute import SimpleImputer
X = [['a', 'b', 'c'], ['d', 'e', np.nan]]
imp_mf = SimpleImputer(missing_values=np.nan, strategy='most_frequent')
transformed_mf = imp_mf.fit_transform(X)
print(transformed_mf)
Expected Results
[['a' 'b' 'c']
['d' 'e' 'c']]
Actual Results
--------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-10-6cee172813bf> in <module>()
6
7 imp_mf = SimpleImputer(missing_values=np.nan, strategy='most_frequent')
----> 8 transformed_mf = imp_mf.fit_transform(X)
9 print(transformed_mf)
2 frames
/content/scikit-learn/sklearn/impute/_base.py in _validate_input(self, X, in_fit)
258 "categorical data represented either as an array "
259 "with integer dtype or an array of string values "
--> 260 "with an object dtype.".format(X.dtype))
261
262 return X
ValueError: SimpleImputer does not support data with dtype <U3. Please provide either a numeric array (with a floating point or integer dtype) or categorical data represented either as an array with integer dtype or an array of string values with an object dtype..
Versions
System:
python: 3.6.9 (default, Apr 18 2020, 01:56:04) [GCC 8.4.0]
executable: /usr/bin/python3
machine: Linux-4.19.104+-x86_64-with-Ubuntu-18.04-bionic
Python dependencies:
pip: 19.3.1
setuptools: 47.1.1
sklearn: 0.23.1
numpy: 1.18.4
scipy: 1.4.1
Cython: 0.29.19
pandas: 1.0.4
matplotlib: 3.2.1
joblib: 0.15.1
threadpoolctl: 2.1.0
Built with OpenMP: True
Metadata
Metadata
Assignees
Labels
No labels