10000 FIX Fix recall in multilabel classification when true labels are all negative by varunagrawal · Pull Request #19085 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

FIX Fix recall in multilabel classification when true labels are all negative #19085

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 25, 2022

Conversation

varunagrawal
Copy link
Contributor
@varunagrawal varunagrawal commented Dec 31, 2020

Reference Issues/PRs

Fixes #8245

What does this implement/fix? Explain your changes.

When all the y_true labels are negative, precision_recall_curve returns nan because of recall being set to nan instead of 1. This is because of the direction division of the tps vector by tps[-1] which is be 0.

This fix checks if tps[-1] is 0 and if yes, sets the recall to 1 directly since there are no True Positives or False Negatives, else we calculate recall as normal.

I updated and added tests to check for this case.

Any other comments?

Please refer to the issue thread for more discussion points.

8000

@varunagrawal varunagrawal changed the title Fix F1 score when all true labels are negative Fix recall in multilabel classification when true labels are all negative Dec 31, 2020
@cmarmo
Copy link
Contributor
cmarmo commented Jan 1, 2021

Thanks @varunagrawal for following-up #14621. The failing check is unrelated with your PR: it will be solved merging #18930.

Base automatically changed from master to main January 22, 2021 10:53
@rayhou0710
Copy link

Is there a plan to merge this commit?

@cmarmo cmarmo added the Bug label May 18, 2021
@cmarmo cmarmo added this to the 1.0 milestone May 18, 2021
@cmarmo
Copy link
Contributor
cmarmo commented May 18, 2021

Hi @varunagrawal, thanks for your patience! Please add an entry to the change log at doc/whats_new/v1.0.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

@adrinjalali
Copy link
Member

@amueller @lesteve you've been involved in the original issue, pinging you in case you wanna leave a review. The issue is tagged in the v1.0 milestone.

@varunagrawal
Copy link
Contributor Author

@cmarmo done and done.

@adrinjalali adrinjalali modified the milestones: 1.0, 1.1 Sep 7, 2021
Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @varunagrawal !

@@ -856,7 +856,7 @@ def precision_recall_curve(y_true, probas_pred, *, pos_label=None, sample_weight

precision = tps / (tps + fps)
precision[np.isnan(precision)] = 0
recall = tps / tps[-1]
recall = np.ones(tps.size) if tps[-1] == 0 else tps / tps[-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In precision_recall_fscore_support, there is a zero_division parameter that controls how we handle the edge case. This parameter is also used in recall_score and precision_score. If we want to be consistent with precision_recall_fscore_support, we would need to add a zero_division and use the same semantics.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasjpfan this issue has been sitting unfixed for years. I think this fix should be merged as-is before it bit rots, and open a separate issue for semantic consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @crypdick on this. It's been over 3 years for what is a very simple bugfix, and we can track semantic consistency via another issue. That issue can be tackled by someone else, potentially from the sklearn team, which would get the ball rolling sooner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's keep the recall set to one. May we update precision_recall_curve's docstring to reflect this behavior?

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your patience and working on this PR.

Comment on lines 967 to 968
y_true = np.array([[1, 0], [0, 1]])
y_score = np.array([[0, 1], [1, 0]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this, since the input is the exactly the same as the one above it.

Comment on lines 626 to 627
- |Fix| Fix recall in multilabel classification when true labels are all negative.
:pr:`19085` by :user:`Varun Agrawal <varunagrawal>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can adjust the whats new to state which function is being fixed.

- |Fix| Fixes `average_precision_score` for multilabel classification when true labels are
   all negative. :pr:`19085` by :user:`Varun Agrawal <varunagrawal>`.

There also needs to be an entry for precision_recall_curve to describe the new behavior.

These entries need to be moved to doc/whats_new/v1.1.rst.

@@ -856,7 +856,7 @@ def precision_recall_curve(y_true, probas_pred, *, pos_label=None, sample_weight

precision = tps / (tps + fps)
precision[np.isnan(precision)] = 0
recall = tps / tps[-1]
recall = np.ones(tps.size) if tps[-1] == 0 else tps / tps[-1]
8000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's keep the recall set to one. May we update precision_recall_curve's docstring to reflect this behavior?

@jeremiedbb
Copy link
Member

I merged main and addressed the comments from reviews and the comments I was going to do.
During an irl discussion, @glemaitre suggested to warn when there are no positive class in y_true.

Copy link
Member
@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @thomasjpfan and @glemaitre do you want to take a look ?

@jeremiedbb
Copy link
Member

Note that setting the recall to one in that case lead to the following behavior, which seems like an acceptable expected behavior.

import numpy as np
from<
A8C6
/span> sklearn.metrics import precision_recall_curve
import matplotlib.pyplot as plt
from sklearn.metrics import PrecisionRecallDisplay

y_true = np.array([0, 0, 0, 0])
y_scores = np.array([0.1, 0.4, 0.35, 0.8])

display = PrecisionRecallDisplay.from_predictions(y_true, y_scores)
plt.show()

prdisplay

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits. I'm still okay with this solution.

@thomasjpfan thomasjpfan changed the title Fix recall in multilabel classification when true labels are all negative FIX Fix recall in multilabel classification when true labels are all negative Mar 24, 2022
Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM waiting for the CI to be green.

@jeremiedbb jeremiedbb merged commit ea0571f into scikit-learn:main Mar 25, 2022
@jeremiedbb
Copy link
Member

Thanks @varunagrawal !

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Apr 6, 2022
…negative (scikit-learn#19085)

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr>
seatim pushed a commit to seatim/birdclef2023 that referenced this pull request Apr 12, 2023
Note that scikit-learn>=1.1 is required in order to have valid values for
average precision score.  That is due to version 1.1+ of scikit-learn including
this fix:

    scikit-learn/scikit-learn#19085

NB: version 1.1+ of scikit-learn requires python 3.8+.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

average_precision_score does not return correct AP when all negative ground truth labels
8 participants
0