-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
check_array does not raise error when input contains something other than numbers or strings #11401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Maybe we need something more custom like |
Well it would only be used in estimators that allows non-numeric data, so maybe it's worth to have an overhead for those, I'm not sure. The thing is I think the above code shouldn't pass, don't you ? |
I think we should have named the ASSUME_FINITE setting ASSUME_DATA_VALID
and we could use it here
|
I agree with Joel that I am not sure the overhead to infer the type of all elements in X is worth the error message. But partly related to this, Currently, what I do in the encoders is something like:
because there is currently no way to indicate "numeric-or-object" (meaning: if it is numeric, keep numeric, if it is object keep object (up to here it would be |
I did some tests and the overhead is indeed quite important. That being said, there is still a issue with the edit : the comment of this test is
Assuming you're talking about |
Now that imputers allow inputs with object dtype, e.g. strings or pandas categoricals, it seems that either
check_array
should be enhanced or that some common tests should be updated.There is common test,
check_dtype_object
, that checks the estimators on inputX
that contains numbers andX[0,0] = {'foo':'bar'}
. When expecting numeric inputs, thecheck_array
is instanciated withdtype='numeric'
and an error is raised as expected.However, when instanciated with
dtype=None
ordtype=object
, no error is raised. See the code below:No error is raised and the estimator works fine. Don't you think that we should raise an error in that case ?
This currently passes the test because when imputing on inputs with object dtypes, we can't set
dtype='numeric'
incheck_array
. I think the error should be raised even withdtype=object
ordtype=None
.We could check that in the
fit
function of the estimators that accept non-numeric inputs, but I think it the role ofcheck_array
to do that. What's your opinion about that ?The text was updated successfully, but these errors were encountered: