8000 Incorrect dtype from check_array validation with Pandas DataFrame · Issue #15795 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Incorrect dtype from check_array validation with Pandas DataFrame #15795
@frances-h

Description

@frances-h

Description

check_array in validation doesn't correctly convert the array when:

  1. dtype is a list or a tuple and
  2. all dataframe column types are valid numpy dtypes and the numpy result type of all the column types is in the dtype array/tuple

Steps/Code to Reproduce

import numpy as np
import pandas as pd

from sklearn.utils import check_array

example = pd.DataFrame({'id': range(1, 5)})
example['float'] = [0, 0.1, 2.0, 3.1]
example['label'] = [True, False, True, False]

FLOAT_DTYPES = (np.float64, np.float32, np.float16)
a = check_array(example, dtype=FLOAT_DTYPES)

Expected Results

>>> a.dtype
dtype('float64')

Actual Results

>>> a.dtype
dtype('O')

Versions

System:
python: 3.7.4 (default, Oct 7 2019, 17:26:17) [Clang 10.0.1 (clang-1001.0.46.4)]
machine: Darwin-18.7.0-x86_64-i386-64bit

Python dependencies:
sklearn: 0.22
numpy: 1.17.4
scipy: 1.3.3
pandas: 0.25.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0