-
-
Notifications
You must be signed in to change notification settings - Fork 26k
ENH learning_curve
raises a warning on failure of CV folds
#26299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
learning_curve
raises warnings on failure of folds
learning_curve
raises warnings on failure of foldslearning_curve
raises a warning on failure of CV folds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR!
@pytest.mark.parametrize("error_score", [np.nan]) | ||
def test_learning_curve_some_failing_fits_warning(error_score, global_random_seed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this specific test, I do not think we need the global_random_seed
.
Similarly, since there is only one error_score
, we do not need to parametrize it.
train_sizes, train_scores, test_scores = learning_curve( | ||
< 8000 td class="blob-code blob-code-addition"> svc, X, y, cv=10, error_score=error_score | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to store
train_sizes, train_scores, test_scores = learning_curve( | |
svc, X, y, cv=10, error_score=error_score | |
) | |
learning_curve(svc, X, y, cv=10, error_score=np.nan) |
doc/whats_new/v1.3.rst
Outdated
- |Enhancement| :func:`model_selection.learning_curve` raises a warning on | ||
failure of the `cv` folds. :pr:`26299` by :user:`Rahil Parikh <rprkh>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- |Enhancement| :func:`model_selection.learning_curve` raises a warning on | |
failure of the `cv` folds. :pr:`26299` by :user:`Rahil Parikh <rprkh>`. | |
- |Enhancement| :func:`model_selection.learning_curve` raises a warning when | |
every cross validation fold fails. :pr:`26299` by :user:`Rahil Parikh <rprkh>`. |
Added the changes @thomasjpfan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update! LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This look good. We can just improve slightly the test.
|
||
X, y = make_classification(n_classes=3, n_informative=6, shuffle=False) | ||
svc = SVC() | ||
svc.fit(X, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
svc.fit(X, y) |
warning_message = re.compile( | ||
( | ||
"10 fits failed out of a total of 50.+The score on these train-test" | ||
" partitions for these parameters will be set to nan.+If these failures" | ||
" are not expected, you can try to debug them by setting" | ||
" error_score='raise'.+ValueError: The number of classes has to be greater" | ||
" than one; got 1 class" | ||
), | ||
flags=re.DOTALL, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be enough to partially match the warning message.
warning_message = re.compile( | |
( | |
"10 fits failed out of a total of 50.+The score on these train-test" | |
" partitions for these parameters will be set to nan.+If these failures" | |
" are not expected, you can try to debug them by setting" | |
" error_score='raise'.+ValueError: The number of classes has to be greater" | |
" than one; got 1 class" | |
), | |
flags=re.DOTALL, | |
) | |
warning_message = "10 fits failed out of a total of 50" |
with pytest.warns(FitFailedWarning, match=warning_message): | ||
learning_curve(svc, X, y, cv=10, error_score=np.nan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also assert for checking that we have the expected np.nan
with pytest.warns(FitFailedWarning, match=warning_message): | |
learning_curve(svc, X, y, cv=10, error_score=np.nan) | |
with pytest.warns(FitFailedWarning, match=warning_message): | |
_, train_score, test_score, *_ = learning_curve(svc, X, y, cv=10, error_score=np.nan) | |
# the first split should lead to raise the warning | |
assert np.isnan(train_score[0]).all() | |
assert np.isnan(test_score[0]).all() | |
for idx in range(1, train_score.shape[0]): | |
assert not np.isnan(train_score[idx]).any() | |
assert not np.isnan(test_score[idx]).any() |
Included the suggestions @glemaitre |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @rprkh Merging. |
Reference Issues/PRs
Fixes #22057
What does this implement/fix? Explain your changes.
Raises the required warning when some of the folds fail for
learning_curve
Any other comments?