MAINT use nanmin to replace nan by finite values in ranking of SearchCV #24543

glemaitre · 2022-09-29T13:35:38Z

For SciPy >= 1.10, rankdata does not deal with nan.
When finding the minimum finite values, we should be using np.nanmin and not min to discard nan values. The subsequent casting is therefore safe.

glemaitre · 2022-09-29T13:36:08Z

This PR should solve the remaining failure shown in the CIs using the dev versions.

glemaitre · 2022-09-29T13:37:59Z

pinging @cmarmo @Micky774 @ogrisel @betatim since you reviewed the previous PR (#24483).

thomasjpfan

Can you run with [scipy-dev] to see if this fixes the issues in CI?

glemaitre · 2022-09-29T13:58:18Z

Done. It will still fail until #24521 is in main ;)

thomasjpfan · 2022-09-29T14:26:52Z

I just merged #24521. ☺️

glemaitre · 2022-09-29T14:31:58Z

OK merged main in this branch and trigger the CI again.

thomasjpfan

Overall, the behavior on with scikit-learn 1.1 and SciPy 1.8 is:

from scipy.stats import rankdata
import numpy as np

array_means = np.asarray([np.nan] * 3)
min_array_means = min(array_means) - 1
array_means = np.nan_to_num(array_means, copy=True, nan=min_array_means)
rankdata(-array_means, method="min")
# array([1, 2, 3])

With this PR and the nanmin fix:

from scipy.stats import rankdata
import numpy as np

array_means = np.asarray([np.nan] * 3)
min_array_means = np.nanmin(array_means) - 1
if np.isnan(min_array_means):
    min_array_means = 0
array_means = np.nan_to_num(array_means, copy=True, nan=min_array_means)
rankdata(-array_means, method="min")
# array([1, 1, 1])

Even tho this is a edge case, it is backward breaking change. If we move forward with this behavior change, I think it belongs in the change log.

Furthermore, for array_means = np.asarray([np.nan] * 3 + [1, 2, 3]), the rank_results are:

SciPy 1.8 and on scikit-learn 1.1 -> rank_results=array([4, 4, 4, 3, 2, 1])
SciPy 1.10 and nanmin fix -> rank_results=array([4, 5, 6, 3, 2, 1])

thomasjpfan · 2022-09-29T17:15:31Z

sklearn/model_selection/_search.py

@@ -968,7 +968,10 @@ def _store(key_name, array, weights=None, splits=False, rank=False):
                # when input is nan, scipy >= 1.10 rankdata returns nan. To
                # keep previous behaviour nans are set to be smaller than the
                # minimum value in the array before ranking
-                min_array_means = min(array_means) - 1
+                min_array_means = np.nanmin(array_means) - 1


When array_means is all nan, there is a RuntimeWarning:

RuntimeWarning: All-NaN slice encountered

We can either catch it or check for all nans first before calling nanmin.

I catch this case early.

glemaitre · 2022-09-29T17:51:23Z

Ranking only nan as no meaning and we should not give a ranking IMO. I would be fine with the current incompatibility. We can always get this behaviour and make it a bug fix with a change log entry. WDYT?

…

Sent from my iPhone

On 29 Sep 2022, at 19:16, Thomas J. Fan ***@***.***> wrote: @thomasjpfan commented on this pull request. Overall, the behavior on with scikit-learn 1.1 and SciPy 1.8 is: from scipy.stats import rankdata import numpy as np array_means = np.asarray([np.nan] * 3) min_array_means = min(array_means) - 1 array_means = np.nan_to_num(array_means, copy=True, nan=min_array_means) rankdata(-array_means, method="min") # array([1, 2, 3]) With this PR and the nanmin fix: from scipy.stats import rankdata import numpy as np array_means = np.asarray([np.nan] * 3) min_array_means = np.nanmin(array_means) - 1 if np.isnan(min_array_means): min_array_means = 0 array_means = np.nan_to_num(array_means, copy=True, nan=min_array_means) rankdata(-array_means, method="min") # array([1, 1, 1]) Even tho this is a edge case, it is backward breaking change. If we move forward with this behavior change, I think it belongs in the change log. Furthermore, for array_means = np.asarray([np.nan] * 3 + [1, 2, 3]), the rank_results are: SciPy 1.8 and on scikit-learn 1.1 -> rank_results=array([4, 4, 4, 3, 2, 1]) SciPy 1.10 and nanmin fix -> rank_results=array([4, 5, 6, 3, 2, 1]) In sklearn/model_selection/_search.py: > @@ -968,7 +968,10 @@ def _store(key_name, array, weights=None, splits=False, rank=False): # when input is nan, scipy >= 1.10 rankdata returns nan. To # keep previous behaviour nans are set to be smaller than the # minimum value in the array before ranking - min_array_means = min(array_means) - 1 + min_array_means = np.nanmin(array_means) - 1 When array_means is all nan, there is a RuntimeWarning: RuntimeWarning: All-NaN slice encountered — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

thomasjpfan · 2022-09-29T17:56:24Z

We can always get this behaviour and make it a bug fix with a change log entry. WDYT?

I am okay with making it a bug fix and adding it to the change log.

…scores

doc/whats_new/v1.2.rst

sklearn/model_selection/_search.py

thomasjpfan

Otherwise LGTM

sklearn/model_selection/_search.py

doc/whats_new/v1.2.rst

Co-authored-by: Tim Head <betatim@gmail.com>

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

ogrisel

It would be great to add or update a test to check more precisely for the new behavior ;)

doc/whats_new/v1.2.rst

ogrisel · 2022-10-13T13:06:47Z

And also please re-push a commit for [scipy-dev].

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

glemaitre · 2022-10-13T14:45:00Z

I modified the previous test to assert the new behaviour. It would fail on main.

glemaitre · 2022-10-13T15:41:33Z

@ogrisel It is green ;)

lesteve

LGTM, do we need to have a similar description in two separate places of the changelog?

I tried to suggest a wording tweak to make it sligthly more clear, not 100% sure I succeeded.

doc/whats_new/v1.2.rst

ogrisel

LGTM (once @lesteve's feedback has been taken into account).

Co-authored-by: Loïc Estève <loic.esteve@ymail.com>

glemaitre · 2022-10-13T16:05:48Z

Thanks @lesteve. I will be merging since I don't run the right CI anyway.

…CV (scikit-learn#24543) Co-authored-by: Tim Head <betatim@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Loïc Estève <loic.esteve@ymail.com>

MAINT use nanmin to replace nan by finite values in ranking of SearchCV

af3056b

github-actions bot added the module:model_selection label Sep 29, 2022

handle the case with only nan values

c7363b6

thomasjpfan reviewed Sep 29, 2022

View reviewed changes

[scipy-dev] trigger CI dev builds

4f094b2

glemaitre added 2 commits September 29, 2022 16:31

Merge remote-tracking branch 'origin/main' into ranking_nan

b34df30

[scipy-dev] trigger CI dev builds

4d2784e

thomasjpfan reviewed Sep 29, 2022

View reviewed changes

glemaitre added 2 commits September 29, 2022 20:48

catch early the case of all failed fit/scoring routinesd

8837641

DOC add a changelog entry to document the change of behaviour of nan …

349dab6

…scores

betatim reviewed Sep 30, 2022

View reviewed changes

doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved

betatim reviewed Sep 30, 2022

View reviewed changes

doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved

ogrisel reviewed Sep 30, 2022

View reviewed changes

sklearn/model_selection/_search.py Outdated Show resolved Hide resolved

thomasjpfan approved these changes Sep 30, 2022

View reviewed changes

sklearn/model_selection/_search.py Outdated Show resolved Hide resolved

doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved

thomasjpfan mentioned this pull request Oct 12, 2022

⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev ⚠️ #24424

Closed

glemaitre and others added 3 commits October 13, 2022 14:48

Update doc/whats_new/v1.2.rst

03c3618

Co-authored-by: Tim Head <betatim@gmail.com>

Update sklearn/model_selection/_search.py

2861a7b

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

thomas comment

78a51fa

ogrisel reviewed Oct 13, 2022

View reviewed changes

doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved

doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved

glemaitre and others added 3 commits October 13, 2022 16:42

TST add a check for new behaviour

17035bd

Update doc/whats_new/v1.2.rst

6390f8f

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Update doc/whats_new/v1.2.rst

412e16e

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

[scipy-dev] trigger nightly builds CIs

ddae3c3

lesteve reviewed Oct 13, 2022

View reviewed changes

doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved

doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved

ogrisel approved these changes Oct 13, 2022

View reviewed changes

glemaitre and others added 2 commits October 13, 2022 18:04

Update doc/whats_new/v1.2.rst

8e33037

Co-authored-by: Loïc Estève <loic.esteve@ymail.com>

Update doc/whats_new/v1.2.rst

73d4ce0

Co-authored-by: Loïc Estève <loic.esteve@ymail.com>

glemaitre merged commit c0b3385 into scikit-learn:main Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT use nanmin to replace nan by finite values in ranking of SearchCV #24543

MAINT use nanmin to replace nan by finite values in ranking of SearchCV #24543

MAINT use nanmin to replace nan by finite values in ranking of SearchCV #24543

MAINT use nanmin to replace nan by finite values in ranking of SearchCV #24543

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment