8000 ENH add zero_division=nan for classification metrics by marctorsoc · Pull Request #23183 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ENH add zero_division=nan for classification metrics #23183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 53 commits into from

Conversation

marctorsoc
Copy link
Contributor
@marctorsoc marctorsoc commented Apr 21, 2022

Reference Issues/PRs

Fixes #22625

What does this implement/fix? Explain your changes.

This is an extension of #14900, where I added the parameter zero_division for precision, recall, and f1. Afterwards, it was added for jaccard as well.

Here, we add the ability to set zero_division to np.nan, so that np.nan is returned when the metric is undefined. In addition to this:

  • when there is an average the numbers that are np.nan (due to undefined and then zero_division) are excluded from the average.
  • when beta=0, return precision
  • when just one of (precision, recall) is defined and it's 0, return fscore=0. Even if the other metric is undefined.

Specifically:

Precision:

  • If pred_sum = 0, undefined
  • If average != None, ignore from average any metric being np.nan

Recall:

If true_sum = 0, undefined
If average != None, ignore from average any class metric being np.nan

F-score:

if beta=inf, return recall, and beta=0, return precision
elif precision=0 or recall=0 (or both), return 0. <------------- this is a change
else return zero_division
If average != None, ignore from average any metric being np.nan

Jaccard:

if all labels and pred are 0, return zero_division
If average != None, ignore from average any metric being np.nan

Any other comments?

marctorsoc and others added 16 commits September 7, 2019 10:17
# Conflicts:
#	doc/whats_new/v0.21.rst
#	sklearn/metrics/classification.py
#	sklearn/metrics/tests/test_classification.py
- F-score only warns if both prec and rec are ill-defined
- new private method to simplify _prf_divide
# Conflicts:
#	sklearn/metrics/_classification.py
#	sklearn/metrics/tests/test_classification.py
- add weights casting to np.array
# Conflicts:
#	sklearn/metrics/_classification.py
#	sklearn/metrics/tests/test_classification.py
def _nan_average(scores: np.ndarray, weights: Optional[np.ndarray]):
"""
Wrapper for np.average, with np.nan values being ignored from the average
This is similar to np.nanmean, but allowing to pass weights as in np.average
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

submitted an issue to numpy for this: numpy/numpy#21375, but let me know if there's a better solution than this wrapper!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the performance difference for running _nan_average and np.average when there are no nans?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk... but I updated this so that if no weights are passed it does just np.nanmean which should be very fast. Otherwise, it creates the mask and does the extra work. tbh, I cannot foresee a big degradation since the operations here are very quick, but 🤷‍♂️

@marctorsoc marctorsoc changed the title [WIP] [FEAT] Zero division nan [FEAT] Zero division nan Apr 22, 2022
@marctorsoc
Copy link
Contributor Author

@thomasjpfan and everyone interested, this is now ready to review :)

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

I think it would be good to see what @jnothman thinks of this np.nan behavior.

Comment on lines 166 to 167
if (weights == 0).all():
return np.average(scores)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for weights == 0 adds more computation for an edge case. Can we pass weights directly into np.average and not do this check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately

ZeroDivisionError
When all weights along axis are zero. 

see https://numpy.org/doc/stable/reference/generated/numpy.average.html. But will change into try/except

Comment on lines 758 to 759
Note that if zero_division is np.nan, such values will be excluded
from the average.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we move this into zero_division. Currently, when reading the zero_division description, the reader needs to scroll up to see what zero_division=np.nan does.

Comment on lines 1306 to 1307
Note that if zero_division is np.nan, such values will be excluded
from the average.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I think we can place this in the zero_division description.

Similar comment for the other docstrings.

- move comment to zero_division
- try/except in nan_average
@glemaitre glemaitre self-requested a review January 23, 2023 09:30
Comment on lines 728 to 760
@pytest.mark.parametrize("zero_division", [0, 1, np.nan])
@pytest.mark.parametrize(
"y_true, y_pred",
[
([0], [1]),
([0, 0], [1, 1]),
([], []),
],
)
def test_matthews_corrcoef_nan(zero_division, y_true, y_pred):
with warnings.catch_warnings(record=True) as record:
mcc = matthews_corrcoef(y_true, y_pred, zero_division=zero_division)
assert not record

if np.isnan(zero_division):
assert np.isnan(mcc)
else:
assert mcc == zero_division


@pytest.mark.parametrize(
"y_true, y_pred",
[
([0], [1]),
([0, 0], [1, 1]),
([], []),
],
)
def test_matthews_corrcoef_nan_warn(y_true, y_pred):
with warnings.catch_warnings(record=True) as record:
mcc = matthews_corrcoef(y_true, y_pred, zero_division="warn")
assert len(record) == 1
assert mcc == 0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is the following general test:

@pytest.mark.parametrize("zero_division", [0, 1, np.nan])
@pytest.mark.parametrize("y_true, y_pred", [([0], [0]), ([], [])])
@pytest.mark.parametrize("metric", [
    jaccard_score,
    matthews_corrcoef,
    f1_score,
    partial(fbeta_score, beta=1),
    precision_score,
    recall_score,
])
def test_zero_division_nan_no_warning(metric, y_true, y_pred, zero_division):
    """Check the behaviour of `zero_division` when setting to 0, 1 or np.nan.
    No warnings should be raised.
    """
    with warnings.catch_warnings():
        warnings.simplefil
A3E2
ter("error")
        result = metric(y_true, y_pred, zero_division=zero_division)

    if np.isnan(zero_division):
        assert np.isnan(result)
    else:
        assert result == zero_division


@pytest.mark.parametrize("y_true, y_pred", [([0], [0]), ([], [])])
@pytest.mark.parametrize("metric", [
    jaccard_score,
    matthews_corrcoef,
    f1_score,
    partial(fbeta_score, beta=1),
    precision_score,
    recall_score,
])
def test_zero_division_nan_warning(metric, y_true, y_pred):
    """Check the behaviour of `zero_division` when setting to "warn".
    A `UndefinedMetricWarning` should be raised.
    """
    with pytest.warns(UndefinedMetricWarning):
        result = metric(y_true, y_pred, zero_division="warn")
    assert result == 0.0

Here, I don't include precision_recall_f1_score_support and classification_report because they are a bit different.

The idea is not to test all potential cases that trigger the warning but the general behaviour when the UndefinedMetricWarning is triggered.

@@ -1678,36 +1712,53 @@ def test_precision_recall_f1_score_multilabel_2():


@ignore_warnings
@pytest.mark.parametrize("zero_division", ["warn", 0, 1])
@pytest.mark.parametrize("zero_division", ["warn", 0, 1, np.nan])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we can directly set the behaviour outside without any programming.

Suggested change
@pytest.mark.parametrize("zero_division", ["warn", 0, 1, np.nan])
@pytest.mark.parametrize(
"zero_division, zero_division_expected",
[("warn", 0), (0, 0), (1, 1), (np.nan, np.nan)]
)

Comment on lines 127 to 128
:mod:`sklearn.metrics`
.................................
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:mod:`sklearn.metrics`
.................................
:mod:`sklearn.metrics`
......................

@glemaitre
Copy link
Member

So I look a bit at the other classification metrics and I have the following question:

  • accuracy_score and balanced_accuracy will raise a warning when y_true and y_pred are empty. We are not really consistent with other metrics then.
  • cohen_kappa_score seems to be a good candidate that should be added to the list of metrics to accept a zero_division parameter.
  • class_likelihood_ratio has a raise_warning parameter that is inconsistent with the current PR. It should not be addressed in this PR. But we should investigate if the deprecation cost is lower than making this metric consistent (and check that it makes sense indeed).

@thomasjpfan could you provide your input on those three different questions?
Otherwise the PR is on a good state on my side.

@marctorsoc
Copy link
Contributor Author

So I look a bit at the other classification metrics and I have the following question:

  • accuracy_score and balanced_accuracy will raise a warning when y_true and y_pred are empty. We are not really consistent with other metrics then.
  • cohen_kappa_score seems to be a good candidate that should be added to the list of metrics to accept a zero_division parameter.
  • class_likelihood_ratio has a raise_warning parameter that is inconsistent with the current PR. It should not be addressed in this PR. But we should investigate if the deprecation cost is lower than making this metric consistent (and check that it makes sense indeed).

@thomasjpfan could you provide your input on those three different questions? Otherwise the PR is on a good state on my side.

Added (2). (1) and (3) seemed more convoluted. Lmk your thoughts

@marctorsoc
Copy link
Contributor Author

@glemaitre @thomasjpfan kind reminder about this :)

@glemaitre
Copy link
Member

The CI is broken. @marctorsoc Could you solve that.

@marctorsoc
Copy link
Contributor Author

The CI is broken. @marctorsoc Could you solve that.

@glemaitre done

@thomasjpfan
Copy link
Member
thomasjpfan commented Jan 31, 2023

@glemaitre I'll say to standardize over zero_division as much as possible, which includes all the metrics you suggested in #23183 (comment). I think it's worth deprecating raise_warning in class_likelihood_ratio.

As for this PR, I think it covers a lot, which makes it harder to reviewer. If we can break up this PR into smaller PRs that adds zero_division one metric at a time, I think it will speed up the process.

@marctorsoc
Copy link
Contributor Author

@glemaitre I'll say to standardize over zero_division as much as possible, which includes all the metrics you suggested in #23183 (comment). I think it's worth deprecating raise_warning in class_likelihood_ratio.

As for this PR, I think it covers a lot, which makes it harder to reviewer. If we can break up this PR into smaller PRs that adds zero_division one metric at a time, I think it will speed up the process.

@thomasjpfan I don't think one metric per PR is a) feasible, b) worth it.

a) because e.g. precision and recall both affect classification_report. b) because it would mean 10+ PRs with very few changes each.

I propose the following:

  1. PR to add zero_division=np.nan to precision, recall, f1, fbeta_score, precision_recall_fscore_support and classification_report
  2. PR to add zero_division to matthews_corrcoef and :func:cohen_kappa_score.
  3. PR to fix :func:classification_report so that empty input will return np.nan
  4. PR to add zero_division for accuracy_score and balanced_accuracy which raise a warning when y_true and y_pred are empty
  5. PR to add zero_division for class_likelihood_ratios and remove raise_warning

I can take care of of most if not all of these, but first I want a greenlight as this will take me some time

@glemaitre
Copy link
Member

I can commit time reviewing individual PRs as proposed by @marctorsoc True that it is difficult to dissociate all metrics.

@marctorsoc marctorsoc changed the title ENH add zero_division parameter to classification metrics ENH add zero_division=nan for classification metrics Feb 2, 2023
@marctorsoc
Copy link
Contributor Author

@glemaitre @thomasjpfan please see #25531

@erikhuck
Copy link

@glemaitre are we able to get this merged at some point or is it being split into different pull requests?

Sorry, something went wrong.

@glemaitre
Copy link
Member

It is split in different pull-requests.

@marctorsoc
Copy link
Contributor Author

I'll close this PR to avoid confusion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add zero_division None or np.nan
4 participants
0