10000 [MRG + 1] Remove "warn_on_dtype" from check_array by praths007 · Pull Request #13382 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG + 1] Remove "warn_on_dtype" from check_array #13382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

praths007
Copy link
Contributor
@praths007 praths007 commented Mar 4, 2019

Reference Issues/PRs:

Remove "warn_on_dtype" from check_array #13324

Implemented changes:

Changes made:

  • deprecated warn_on_dtype from check_array and check_X_y
  • the only occurrence of a DataConversionWarning is now handled directly in check_pairwise_array since this is only related to boolean metrics.

@praths007 praths007 changed the title [WIP] Remove "warn_on_dtype" from check_array Remove "warn_on_dtype" from check_array Mar 4, 2019
Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very hard to focus on the substance of your pr with all the other changes

@NicolasHug
Copy link
Member

@jnothman shouldn't there be a ChangedBehaviourWarning warning in check_pairwise_arrays?

The DataConversionWarning won't be raised anymore regardless of the dtype param of check_pariwise_array.

@jnothman
Copy link
Member
jnothman commented Mar 19, 2019 via email

@praths007
Copy link
Contributor Author

There are places where test cases break because of raising a DeprecationWarning:

  1. Some of these test cases actually check the number of warnings being raised like test_affinity_propagation_convergence_warning_dense_sparse
  2. Others have assert statements that indirectly check for warnings using assert_no_warnings

So if there is a DeprecationWarning raised it will still break the test case as assert will be wrong. Is there any way around this?

@NicolasHug
Copy link
Member

But do you mean that we should issue a ChangedBehaviorWarning in cases where we used to issue a conversion warning?

Or even more conservative, go through a deprecation cycle? We're changing the behaviour of a public function, unless it's a bugfix we need a deprecation cycle right?

(I'm not suggesting we should do that, overall I'm just curious about when deprecation cycles are relevant and when they're not)

@jnothman
Copy link
Member
jnothman commented Mar 19, 2019 via email

@praths007
Copy link
Contributor Author

I believe that my limited understanding of this issue causes this PR to not be efficacious. Should I go ahead and close it? Perhaps someone else can provide a better solution.

Copy link
Member
@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

@jnothman

Overall I'm not yet committed to this change

So you've changed your mind since #13324 (comment)?


I believe that my limited understanding of this issue causes this PR to not be efficacious

You're almost there @praths007 !

What this PR needs to do is:

  • deprecate warn_on_dtype from check_array and keep the current behaviour (i.e. warning on dtype conversion).
  • remove its use from check_pariwise_arrays. No need for a deprecation cycle here.

You've already done most of it (please see my comments below).

Though I would suggest to wait for Joel's feedback before going back to it to avoid unnecessary work...

@@ -417,6 +418,11 @@ def test_check_array_dtype_warning():
assert_equal(X_checked.format, 'csr')


def test_check_array_warn_on_dtype_deprecation():
X = np.asarray([[0.0], [1.0]])
assert_warns(DeprecationWarning, check_array, X, warn_on_dtype=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use pytest.wars with the match parameter as shown here, instead of assert_warns

Copy link
Member
@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

@jnothman

Overall I'm not yet committed to this change

So you've changed your mind since #13324 (comment)?


I believe that my limited understanding of this issue causes this PR to not be efficacious

You're almost there @praths007 !

What this PR needs to do is:

  • deprecate warn_on_dtype from check_array and keep the current behaviour (i.e. warning on dtype conversion).
  • remove its use from check_pariwise_arrays. No need for a deprecation cycle here.

You've already done most of it (please see my comments below).

Though I would suggest to wait for Joel's feedback before going back to it to avoid unnecessary work...

@jnothman
Copy link
Member
jnothman commented Mar 23, 2019 via email

@NicolasHug
Copy link
Member

@praths007 , as discussed here #13324 (comment), this PR would need an update to implement the following point:

  • raise DataConversionWarning in pairwise_distances, in case metric in PAIRWISE_BOOLEAN_FUNCTIONS and the dtype of the array isn't already bool:

    The input data need to be converted to bool for boolean metric {metric}. This will create a copy. To remove this warning, explicitly convert the data using e.g. X = X.astype(np.bool) (same for Y if needed).

LMK if you need help or if you'd rather someone else take it over

@praths007
Copy link
Contributor Author

Thank You @NicolasHug, Majority of the tests cases pass now except documentation (which I am assuming fails because the docstring has now changed from False to None). Is there someplace this change needs to be reflected .

@@ -122,6 +122,10 @@ def test_pairwise_distances():
# Test that a value error is raised if the metric is unknown
assert_raises(ValueError, pairwise_distances, X, Y, metric="blah")

# Test boolean metric raises dataconversion warning
assert_warns_message(DataConversionWarning, 'boolean', pairwise_distances,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use pytest here, and a longer match for the message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -712,7 +725,10 @@ def test_check_dataframe_warns_on_dtype():
check_array, df, dtype=np.float64, warn_on_dtype=True)
assert_warns(DataConversionWarning, check_array, df,
dtype='numeric', warn_on_dtype=True)
assert_no_warnings(check_array, df, dtype='object', warn_on_dtype=True)
with pytest.warns(DeprecationWarning,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use pytest.warn(None) and put this in a separate test if you need to ignore the deprecation warning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member
@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments but this is looking good

Please also add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself with :user:.

if warn_on_dtype is not None:
warnings.warn(
"'warn_on_dtype' is deprecated in "
"version 0.21 and will be removed in 0.23 "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"version 0.21 and will be removed in 0.23 "
"version 0.21 and will be removed in 0.23. "

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member
@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last comments but this LGTM otherwise

Thanks @praths007 for sticking to it!!

@NicolasHug
Copy link
Member

Also please update this PR's header since it's a bit outdated now ;)

  • deprecated warn_on_dtype from check_array and check_X_y
  • the only occurrence of a DataConversionWarning is now handled directly in check_pairwise_array since this is only related to boolean metrics.

@NicolasHug NicolasHug changed the title Remove "warn_on_dtype" from check_array [MRG + 1] Remove "warn_on_dtype" from check_array Apr 5, 2019
Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of mostly cosmetic comments to address before merging.


if dtype == bool and (X.dtype != bool or Y.dtype != bool):
msg = ("Data was converted to boolean "
"for metric %s" % (metric))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cosmetics: this fits on one line and there is no need to wrap the metric variable with parenthesis:

            msg = "Data was converted to boolean for metric %s" % metric

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# non-boolean arrays are converted to boolean for boolean
# distance metrics with a data conversion warning
msg = ("Data was converted to boolean "
"for metric %s" % (metric))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark:

    msg = "Data was converted to boolean for metric %s" % metric

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


def test_no_data_conversion_warning():
# No warnings issued if metric is not a
# boolean distance function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, no need to split such comments on multiple lines before the 79 chars limit of PEP8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -687,6 +701,7 @@ def test_suppress_validation():
assert_raises(ValueError, assert_all_finite, X)


@pytest.mark.filterwarnings("ignore: 'warn_on_dtype' is deprecated") # 0.23
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of ignoring the warning, we should update the test to stop passing the warn_on_dtype argument as test_check_array_series is actually not concerned with the warn_on_dtype thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

warnings.warn(
"'warn_on_dtype' is deprecated in "
"version 0.21 and will be removed in 0.23. "
"Use 'warn_on_dtype=None' to remove this warning.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This phrasing is problematic: we cannot tell people to explicitly set a parameter that will be removed at a release. This will break their code when they upgrade.

Instead we should tell them to not set the warn_on_dtype parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please do not use that many line breaks to format this string either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Raise DataConversionWarning if the dtype of the input data structure
does not match the requested dtype, causing a memory copy.

.. deprecated:: 0.21
``warn_on_dtype`` is deprecated in
version 0.21 and will be removed in 0.23.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 says the limit is at 79 so there is no need to break that early:

        .. deprecated:: 0.21
            ``warn_on_dtype`` is deprecated in version 0.21 and will be
             removed in 0.23.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Raise DataConversionWarning if the dtype of the input data structure
does not match the requested dtype, causing a memory copy.

.. deprecated:: 0.21
``warn_on_dtype`` is deprecated in
version 0.21 and will be removed in 0.23.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@NicolasHug NicolasHug merged commit 4b9e12e into scikit-learn:master Apr 19, 2019
@NicolasHug
Copy link
Member
NicolasHug commented Apr 19, 2019

Comments have been addressed, merging

Thanks @praths007 !

jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Apr 25, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0