8000 Remove "warn_on_dtype" from check_array · Issue #13324 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Remove "warn_on_dtype" from check_array #13324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amueller opened this issue Feb 28, 2019 · 15 comments
Closed

Remove "warn_on_dtype" from check_array #13324

amueller opened this issue Feb 28, 2019 · 15 comments
Labels
Easy Well-defined and straightforward way to resolve good first issue Easy with clear instructions to resolve help wanted

Comments

@amueller
Copy link
Member

We're not using warn_on_dtype any more (see #13306) and so we should deprecate it.

@amueller amueller added Easy Well-defined and straightforward way to resolve good first issue Easy with clear instructions to resolve help wanted labels Feb 28, 2019
@amueller
Copy link
Member Author
amueller commented Feb 28, 2019

Actually, it's still used in check_pairwise_array. Do we want to keep it there?

@2sang
Copy link
2sang commented Feb 28, 2019

Where could I find the function check_pairwise_array you mentioned?

@jnothman
Copy link
Member
jnothman commented Mar 1, 2019 via email

@jnothman
Copy link
Member

I'm not inspired by its use in pairwise... Happy to see it go. What would you change if you received that warning?

@jnothman
Copy link
Member

I suppose you'd consider casting your data first to avoid repeated recasting??

@praths007
Copy link
Contributor

Assuming that no warnings should be issued for datatype changes in arrays. If warn_on_dtype were to be removed from check_pairwise_array, shouldn't the parameter be removed from check_array (under /validation) as well? Because apart from check_pairwise_array no other method passes warn_on_dtype parameter to it.

Or is the assumption wrong and this feature of issuing warnings on type changes will be required going forward?

@jnothman
Copy link
Member
jnothman commented Mar 13, 2019 via email

@praths007
Copy link
Contributor
praths007 commented Mar 13, 2019

should that be done in a different PR? Because it might involve updating test cases under /pairwise and /validation

@jnothman
Copy link
Member

Sorry, I didn't understand the usage. There is no warn_on_dtype parameter to check_pairwise_arrays, it only makes use of check_array's warn_on_dtype parameter. We would only need to remove its usage in check_pairwise_arrays and deprecate its acceptance in check_array.

I don't know whether the warning is useful in pairwise metrics. @amueller?

@NicolasHug
Copy link
Member

I don't know whether the warning is useful in pairwise metrics. @amueller?

Here's what I found:

  1. check_pairwise_arrays is the only tool that makes use of the warn_on_dtype parameter of check_array. This is only the case when the dtype parameter of check_pairwise_arrays is not None.
  2. The only call to check_pairwise_arrays with a non-None dtype is in pairwise_distances(), for these metrics only:
# These distances recquire boolean arrays, when using scipy.spatial.distance
PAIRWISE_BOOLEAN_FUNCTIONS = [
    'dice',
    'jaccard',
    'kulsinski',
    'matching',
    'rogerstanimoto',
    'russellrao',
    'sokalmichener',
    'sokalsneath',
    'yule',
]

This was introduced in #6932. We need this because scipy computations are wrong if the arrays are floats: see #4523 (comment) for a summary.


May I propose the following:

The input data need to be converted to bool for boolean metric {metric}. This will create a copy. To remove this warning, explicitly convert the data using e.g. X = X.astype(np.bool) (same for Y if needed).

@jnothman
Copy link
Member
jnothman commented Mar 24, 2019 via email

@MohammedKhandwawala
Copy link

@amueller is anyone working on this? or can i take this issue.

@NicolasHug
Copy link
Member

@MohammedKhandwawala you can see in the feed here things like

Someone referenced this issue xx days ago

these are often pull requests addressing the issue (but not always).

Sorry, we're not good at updating labels

@rth
Copy link
Member
rth commented Jul 3, 2019

@NicolasHug This was fully fixed in #13382, right? Then we can also close #13434?

@NicolasHug
Copy link
Member

Yes, thanks for the reminder! Closing both issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Easy Well-defined and straightforward way to resolve good first issue Easy with clear instructions to resolve help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants
0