8000 MAINT use nanmin to replace nan by finite values in ranking of SearchCV by glemaitre · Pull Request #24543 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

MAINT use nanmin to replace nan by finite values in ranking of SearchCV #24543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Oct 13, 2022

Conversation

glemaitre
Copy link
Member
@glemaitre glemaitre commented Sep 29, 2022

Fixes #24424

For SciPy >= 1.10, rankdata does not deal with nan.
When finding the minimum finite values, we should be using np.nanmin and not min to discard nan values. The subsequent casting is therefore safe.

< 8000 /div>
@glemaitre
Copy link
Member Author

This PR should solve the remaining failure shown in the CIs using the dev versions.

@glemaitre
Copy link
Member Author
glemaitre commented Sep 29, 2022

pinging @cmarmo @Micky774 @ogrisel @betatim since you reviewed the previous PR (#24483).

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run with [scipy-dev] to see if this fixes the issues in CI?

@glemaitre
Copy link
Member Author

Done. It will still fail until #24521 is in main ;)

@thomasjpfan
Copy link
Member

I just merged #24521. ☺️

@glemaitre
Copy link
Member Author

OK merged main in this branch and trigger the CI again.

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the behavior on with scikit-learn 1.1 and SciPy 1.8 is:

from scipy.stats import rankdata
import numpy as np

array_means = np.asarray([np.nan] * 3)
min_array_means = min(array_means) - 1
array_means = np.nan_to_num(array_means, copy=True, nan=min_array_means)
rankdata(-array_means, method="min")
# array([1, 2, 3])

With this PR and the nanmin fix:

from scipy.stats import rankdata
import numpy as np

array_means = np.asarray([np.nan] * 3)
min_array_means = np.nanmin(array_means) - 1
if np.isnan(min_array_means):
    min_array_means = 0
array_means = np.nan_to_num(array_means, copy=True, nan=min_array_means)
rankdata(-array_means, method="min")
# array([1, 1, 1])

Even tho this is a edge case, it is backward breaking change. If we move forward with this behavior change, I think it belongs in the change log.

Furthermore, for array_means = np.asarray([np.nan] * 3 + [1, 2, 3]), the rank_results are:

  • SciPy 1.8 and on scikit-learn 1.1 -> rank_results=array([4, 4, 4, 3, 2, 1])
  • SciPy 1.10 and nanmin fix -> rank_results=array([4, 5, 6, 3, 2, 1])

@@ -968,7 +968,10 @@ def _store(key_name, array, weights=None, splits=False, rank=False):
# when input is nan, scipy >= 1.10 rankdata returns nan. To
# keep previous behaviour nans are set to be smaller than the
# minimum value in the array before ranking
min_array_means = min(array_means) - 1
min_array_means = np.nanmin(array_means) - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When array_means is all nan, there is a RuntimeWarning:

RuntimeWarning: All-NaN slice encountered

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can either catch it or check for all nans first before calling nanmin.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I catch this case early.

@glemaitre
10000 Copy link
Member Author
glemaitre commented Sep 29, 2022 via email

@thomasjpfan
Copy link
Member

We can always get this behaviour and make it a bug fix with a change log entry. WDYT?

I am okay with making it a bug fix and adding it to the change log.

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

glemaitre and others added 3 commits October 13, 2022 14:48
Co-authored-by: Tim Head <betatim@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to add or update a test to check more precisely for the new behavior ;)

@ogrisel
Copy link
Member
ogrisel commented Oct 13, 2022

And also please re-push a commit for [scipy-dev].

glemaitre and others added 3 commits October 13, 2022 16:42
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@glemaitre
Copy link
Member Author

I modified the previous test to assert the new behaviour. It would fail on main.

@glemaitre
Copy link
Member Author

@ogrisel It is green ;)

Copy link
Member
@lesteve lesteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, do we need to have a similar description in two separate places of the changelog?

I tried to suggest a wording tweak to make it sligthly more clear, not 100% sure I succeeded.

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (once @lesteve's feedback has been taken into account).

glemaitre and others added 2 commits October 13, 2022 18:04
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
@glemaitre
Copy link
Member Author

Thanks @lesteve. I will be merging since I don't run the right CI anyway.

@glemaitre glemaitre merged commit c0b3385 into scikit-learn:main Oct 13, 2022
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Oct 31, 2022
…CV (scikit-learn#24543)


Co-authored-by: Tim Head <betatim@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev ⚠️
5 participants
0