-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG + 1] Remove "warn_on_dtype" from check_array #13382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG + 1] Remove "warn_on_dtype" from check_array #13382
Conversation
merge fork
…https://github.com/praths007/scikit-learn into remove_warn_on_dtype_from_check_array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very hard to focus on the substance of your pr with all the other changes
@jnothman shouldn't there be a The |
Overall I'm not yet committed to this change. We had problems with the
other use of warn_on_dtype and I'm not sure we gain from removing it here.
But do you mean that we should issue a ChangedBehaviorWarning in cases
where we used to issue a conversion warning?
|
…to remove_warn_on_dtype_from_check_array
There are places where test cases break because of raising a DeprecationWarning:
So if there is a DeprecationWarning raised it will still break the test case as assert will be wrong. Is there any way around this? |
Or even more conservative, go through a deprecation cycle? We're changing the behaviour of a public function, unless it's a bugfix we need a deprecation cycle right? (I'm not suggesting we should do that, overall I'm just curious about when deprecation cycles are relevant and when they're not) |
We're changing a side-effect rather than a behaviour of a public
function... I don't think there's any real need to deprecate.
|
I believe that my limited understanding of this issue causes this PR to not be efficacious. Should I go ahead and close it? Perhaps someone else can provide a better solution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments.
Overall I'm not yet committed to this change
So you've changed your mind since #13324 (comment)?
I believe that my limited understanding of this issue causes this PR to not be efficacious
You're almost there @praths007 !
What this PR needs to do is:
- deprecate
warn_on_dtype
fromcheck_array
and keep the current behaviour (i.e. warning on dtype conversion). - remove its use from
check_pariwise_arrays
. No need for a deprecation cycle here.
You've already done most of it (please see my comments below).
Though I would suggest to wait for Joel's feedback before going back to it to avoid unnecessary work...
@@ -417,6 +418,11 @@ def test_check_array_dtype_warning(): | |||
assert_equal(X_checked.format, 'csr') | |||
|
|||
|
|||
def test_check_array_warn_on_dtype_deprecation(): | |||
X = np.asarray([[0.0], [1.0]]) | |||
assert_warns(DeprecationWarning, check_array, X, warn_on_dtype=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use pytest.wars with the match
parameter as shown here, instead of assert_warns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments.
Overall I'm not yet committed to this change
So you've changed your mind since #13324 (comment)?
I believe that my limited understanding of this issue causes this PR to not be efficacious
You're almost there @praths007 !
What this PR needs to do is:
- deprecate
warn_on_dtype
fromcheck_array
and keep the current behaviour (i.e. warning on dtype conversion). - remove its use from
check_pariwise_arrays
. No need for a deprecation cycle here.
You've already done most of it (please see my comments below).
Though I would suggest to wait for Joel's feedback before going back to it to avoid unnecessary work...
I think my opinion then was fairly ambivalent, not a strong vote for its
removal. Sorry to be ambiguous.
|
@praths007 , as discussed here #13324 (comment), this PR would need an update to implement the following point:
LMK if you need help or if you'd rather someone else take it over |
Thank You @NicolasHug, Majority of the tests cases pass now except documentation (which I am assuming fails because the docstring has now changed from False to None). Is there someplace this change needs to be reflected . |
@@ -122,6 +122,10 @@ def test_pairwise_distances(): | |||
# Test that a value error is raised if the metric is unknown | |||
assert_raises(ValueError, pairwise_distances, X, Y, metric="blah") | |||
|
|||
# Test boolean metric raises dataconversion warning | |||
assert_warns_message(DataConversionWarning, 'boolean', pairwise_distances, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use pytest here, and a longer match for the message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -712,7 +725,10 @@ def test_check_dataframe_warns_on_dtype(): | |||
check_array, df, dtype=np.float64, warn_on_dtype=True) | |||
assert_warns(DataConversionWarning, check_array, df, | |||
dtype='numeric', warn_on_dtype=True) | |||
assert_no_warnings(check_array, df, dtype='object', warn_on_dtype=True) | |||
with pytest.warns(DeprecationWarning, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use pytest.warn(None)
and put this in a separate test if you need to ignore the deprecation warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
circle ci failure fix
…to remove_warn_on_dtype_from_check_array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments but this is looking good
Please also add an entry to the change log at doc/whats_new/v*.rst
. Like the other entries there, please reference this pull request with :issue:
and credit yourself with :user:
.
sklearn/utils/validation.py
Outdated
if warn_on_dtype is not None: | ||
warnings.warn( | ||
"'warn_on_dtype' is deprecated in " | ||
"version 0.21 and will be removed in 0.23 " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"version 0.21 and will be removed in 0.23 " | |
"version 0.21 and will be removed in 0.23. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last comments but this LGTM otherwise
Thanks @praths007 for sticking to it!!
Also please update this PR's header since it's a bit outdated now ;)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of mostly cosmetic comments to address before merging.
sklearn/metrics/pairwise.py
Outdated
|
||
if dtype == bool and (X.dtype != bool or Y.dtype != bool): | ||
msg = ("Data was converted to boolean " | ||
"for metric %s" % (metric)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cosmetics: this fits on one line and there is no need to wrap the metric variable with parenthesis:
msg = "Data was converted to boolean for metric %s" % metric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
# non-boolean arrays are converted to boolean for boolean | ||
# distance metrics with a data conversion warning | ||
msg = ("Data was converted to boolean " | ||
"for metric %s" % (metric)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remark:
msg = "Data was converted to boolean for metric %s" % metric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
def test_no_data_conversion_warning(): | ||
# No warnings issued if metric is not a | ||
# boolean distance function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, no need to split such comments on multiple lines before the 79 chars limit of PEP8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -687,6 +701,7 @@ def test_suppress_validation(): | |||
assert_raises(ValueError, assert_all_finite, X) | |||
|
|||
|
|||
@pytest.mark.filterwarnings("ignore: 'warn_on_dtype' is deprecated") # 0.23 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of ignoring the warning, we should update the test to stop passing the warn_on_dtype
argument as test_check_array_series
is actually not concerned with the warn_on_dtype
thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
sklearn/utils/validation.py
Outdated
warnings.warn( | ||
"'warn_on_dtype' is deprecated in " | ||
"version 0.21 and will be removed in 0.23. " | ||
"Use 'warn_on_dtype=None' to remove this warning.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This phrasing is problematic: we cannot tell people to explicitly set a parameter that will be removed at a release. This will break their code when they upgrade.
Instead we should tell them to not set the warn_on_dtype
parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please do not use that many line breaks to format this string either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
sklearn/utils/validation.py
Outdated
Raise DataConversionWarning if the dtype of the input data structure | ||
does not match the requested dtype, causing a memory copy. | ||
|
||
.. deprecated:: 0.21 | ||
``warn_on_dtype`` is deprecated in | ||
version 0.21 and will be removed in 0.23. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8 says the limit is at 79 so there is no need to break that early:
.. deprecated:: 0.21
``warn_on_dtype`` is deprecated in version 0.21 and will be
removed in 0.23.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
sklearn/utils/validation.py
Outdated
Raise DataConversionWarning if the dtype of the input data structure | ||
does not match the requested dtype, causing a memory copy. | ||
|
||
.. deprecated:: 0.21 | ||
``warn_on_dtype`` is deprecated in | ||
version 0.21 and will be removed in 0.23. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
merge fork
merge sklearn-master
Comments have been addressed, merging Thanks @praths007 ! |
This reverts commit ac23178.
This reverts commit ac23178.
Reference Issues/PRs:
Remove "warn_on_dtype" from check_array #13324
Implemented changes:
Changes made: