8000 ENH `learning_curve` raises a warning on failure of CV folds by rprkh · Pull Request #26299 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ENH learning_curve raises a warning on failure of CV folds #26299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Nov 17, 2023

Conversation

rprkh
Copy link
Contributor
@rprkh rprkh commented Apr 29, 2023

Reference Issues/PRs

Fixes #22057

What does this implement/fix? Explain your changes.

Raises the required warning when some of the folds fail for learning_curve

Any other comments?

8000

@rprkh rprkh changed the title learning_curve raises warnings on failure of folds ENH learning_curve raises warnings on failure of folds Apr 29, 2023
@rprkh rprkh changed the title ENH learning_curve raises warnings on failure of folds ENH learning_curve raises a warning on failure of CV folds Apr 29, 2023
Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

Comment on lines 2391 to 2392
@pytest.mark.parametrize("error_score", [np.nan])
def test_learning_curve_some_failing_fits_warning(error_score, global_random_seed):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this specific test, I do not think we need the global_random_seed.

Similarly, since there is only one error_score, we do not need to parametrize it.

Comment on lines 2411 to 2413
< 8000 td class="blob-code blob-code-addition"> svc, X, y, cv=10, error_score=error_score
train_sizes, train_scores, test_scores = learning_curve(
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to store

Suggested change
train_sizes, train_scores, test_scores = learning_curve(
svc, X, y, cv=10, error_score=error_score
)
learning_curve(svc, X, y, cv=10, error_score=np.nan)

Comment on lines 416 to 417
- |Enhancement| :func:`model_selection.learning_curve` raises a warning on
failure of the `cv` folds. :pr:`26299` by :user:`Rahil Parikh <rprkh>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- |Enhancement| :func:`model_selection.learning_curve` raises a warning on
failure of the `cv` folds. :pr:`26299` by :user:`Rahil Parikh <rprkh>`.
- |Enhancement| :func:`model_selection.learning_curve` raises a warning when
every cross validation fold fails. :pr:`26299` by :user:`Rahil Parikh <rprkh>`.

@rprkh
Copy link
Contributor Author
rprkh commented Jun 2, 2023

Added the changes @thomasjpfan

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update! LGTM

@thomasjpfan thomasjpfan added the Waiting for Second Reviewer First reviewer is done, need a second one! label Jun 2, 2023
@glemaitre glemaitre self-requested a review November 16, 2023 15:47
Copy link
github-actions bot commented Nov 16, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 429fefa. Link to the linter CI: here

Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This look good. We can just improve slightly the test.


X, y = make_classification(n_classes=3, n_informative=6, shuffle=False)
svc = SVC()
svc.fit(X, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
svc.fit(X, y)

Comment on lines 2426 to 2435
warning_message = re.compile(
(
"10 fits failed out of a total of 50.+The score on these train-test"
" partitions for these parameters will be set to nan.+If these failures"
" are not expected, you can try to debug them by setting"
" error_score='raise'.+ValueError: The number of classes has to be greater"
" than one; got 1 class"
),
flags=re.DOTALL,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be enough to partially match the warning message.

Suggested change
warning_message = re.compile(
(
"10 fits failed out of a total of 50.+The score on these train-test"
" partitions for these parameters will be set to nan.+If these failures"
" are not expected, you can try to debug them by setting"
" error_score='raise'.+ValueError: The number of classes has to be greater"
" than one; got 1 class"
),
flags=re.DOTALL,
)
warning_message = "10 fits failed out of a total of 50"

Comment on lines 2436 to 2437
with pytest.warns(FitFailedWarning, match=warning_message):
learning_curve(svc, X, y, cv=10, error_score=np.nan)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also assert for checking that we have the expected np.nan

Suggested change
with pytest.warns(FitFailedWarning, match=warning_message):
learning_curve(svc, X, y, cv=10, error_score=np.nan)
with pytest.warns(FitFailedWarning, match=warning_message):
_, train_score, test_score, *_ = learning_curve(svc, X, y, cv=10, error_score=np.nan)
# the first split should lead to raise the warning
assert np.isnan(train_score[0]).all()
assert np.isnan(test_score[0]).all()
for idx in range(1, train_score.shape[0]):
assert not np.isnan(train_score[idx]).any()
assert not np.isnan(test_score[idx]).any()

@rprkh
Copy link
Contributor Author
rprkh commented Nov 17, 2023

Included the suggestions @glemaitre

Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@glemaitre glemaitre merged commit 4081170 into scikit-learn:main Nov 17, 2023
@glemaitre
Copy link
Member

Thanks @rprkh Merging.

@rprkh rprkh deleted the enh_learning_curve branch November 17, 2023 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:model_selection Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Raise warning when fit/score fail in learning curve
3 participants
0