10000 FEA Adds `decision_threshold_curve` function by lucyleeow · Pull Request #31338 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

FEA Adds decision_threshold_curve function #31338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 53 commits into
base: main
Choose a base branch
from

Conversation

lucyleeow
Copy link
Member
@lucyleeow lucyleeow commented May 8, 2025

Reference Issues/PRs

closes #25639 (supercedes)
closes #21391

Thought it would be easier to open a new PR and there were no long discussions on the old PR (#25639).

What does this implement/fix? Explain your changes.

Adds a function that takes a scoring function, y_true, y_score, thresholds and outputs score per threshold.

The intention is to later add a new display class using this function, allowing us to plot metric score per threshold e.g.

image

Uses the _CurveScorer as suggested by @glemaitre . Refactors out a new _scores_from_predictions static method, that takes the predictions. The old _score takes estimator, calculates y_score and passes to the new _scores_from_prediction.

Notes:

  • _scores_from_predictions - name is inspired by the display class methods 'from_predictions' , but happy to change
  • it did not make sense to use from_scorer because we are not using a scorer (which has the signature callable(estimator, X, y)) we are using a 'scoring_functionwith signaturescore_func(y_true, y_pred, **kwargs)`
    • we instantiate _CurveScorer directly instead, and then call _scores_from_predictions in decision_threshold_curve
    • decided to make _scores_from_predictions a static method, but I also could have made it a method, and in decision_threshold_curve instantiate _CurveScorer directly first (not via from_scorer). Only went with staticmethod path because I initially didn't register from_scorer instantiates differently than directly via _CurveScorer. Not 100% on what is best here. [realised that method is nicer to avoid having too many params in decision_threshold_curve and avoids some lines of code (use self.xx directly, instead of passing self.xx to _scores_from_predictions)]

Any other comments?

cc @glemaitre

Lots more to do, but should get implementation right first.

To do:

  • Add tests
  • Add example
  • Review _decision_threshold.py module docstring

Copy link
github-actions bot commented May 8, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 577ea24. Link to the linter CI: here

labels: array-like, default=None
Class labels. If `None`, inferred from `y_true`.

pos_label : int, float, bool or str, default=None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other main question is whether we need to allow the user to set this, even when scoring_function does not take a pos_label parameter?

I think we should have this, because pos_label is passed to _threshold_scores_to_class_labels, and a user should be able to control this.

@glemaitre glemaitre self-requested a review May 9, 2025 19:06
@@ -1,6 +1,6 @@
.. currentmodule:: sklearn.model_selection

.. _TunedThresholdClassifierCV:
.. _threshold_tunning:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I referenced this page starting here for decision_threshold_curve because I thought this first paragraph was appropriate, for context, not just the section I added. Not 100% on this though and happy to change

@@ -1,6 +1,6 @@
.. currentmodule:: sklearn.model_selection

.. _TunedThresholdClassifierCV:
.. _threshold_tunning:

==================================================
Tuning the decision threshold for class prediction
Copy link
Member Author
@lucyleeow lucyleeow May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glemaitre Can't comment where this is relevant (after L49), but I wonder if it would be interesting to add another scenario where threshold tunning may be of interest - imbalanced datasets?

good, or a loss function, meaning low is good. In the latter case, the
the output of `score_func` will be sign-flipped.

labels : array-like, default=None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_CurveScorer uses the term "classes" but "labels" is consistent with what is used for other classification metrics, so chose this.

between the minimum and maximum of `y_score`. If an array-like, it will be
used as the thresholds.

greater_is_better : bool, default=True
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is consistent, with term used in other metrics, so avoided using "sign".

Comment on lines +1146 to +1160
# This could also be done in `decision_threshold_curve`, not sure which
# is better
y_true_unique = cached_unique(y_true)
if classes is None:
classes = y_true_unique
# not sure if this separate error msg needed.
# there is the possibility that set(classes) != set(y_true_unique) fails
# because `y_true` only contains one class.
if len(y_true_unique) == 1:
raise ValueError("`y_true` only contains one class label.")
if set(classes) != set(y_true_unique):
raise ValueError(
f"`classes` ({classes}) is not equal to the unique values found in "
f"`y_true` ({y_true_unique})."
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks could be done in decision_threshold_curve instead, not sure which is better.

for th in potential_thresholds
]
return np.array(score_thresholds), potential_thresholds
# why 'potential' ?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my education, why use the term "potential", in "potential_thresholds". Is it because there a possibility that a threshold is redundant because the predicted labels are the same for adjacent thresholds?

@lucyleeow
Copy link
Member Author

@glemaitre I've highlighted questions in the code.

Tests still to be added for decision_threshold_curve. Depending on where the validation checks are done, I don't think we need many value checks, as the function just uses _CurveScorer methods, which should be tested elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add sklearn.metrics Display class to plot Precision/Recall/F1 for probability thresholds
3 participants
0