FEA Adds `decision_threshold_curve` function #31338

lucyleeow · 2025-05-08T08:50:50Z

Reference Issues/PRs

closes #25639 (supercedes)
closes #21391

Thought it would be easier to open a new PR and there were no long discussions on the old PR (#25639).

What does this implement/fix? Explain your changes.

Adds a function that takes a scoring function, y_true, y_score, thresholds and outputs score per threshold.

The intention is to later add a new display class using this function, allowing us to plot metric score per threshold e.g.

Uses the _CurveScorer as suggested by @glemaitre . Refactors out a new _scores_from_predictions static method, that takes the predictions. The old _score takes estimator, calculates y_score and passes to the new _scores_from_prediction.

Notes:

_scores_from_predictions - name is inspired by the display class methods 'from_predictions' , but happy to change
it did not make sense to use from_scorer because we are not using a scorer (which has the signature callable(estimator, X, y)) we are using a 'scoring_functionwith signaturescore_func(y_true, y_pred, **kwargs)`
- we instantiate _CurveScorer directly instead, and then call _scores_from_predictions in decision_threshold_curve
- decided to make _scores_from_predictions a static method, but I also could have made it a method, and in decision_threshold_curve instantiate _CurveScorer directly first (not via from_scorer). Only went with staticmethod path because I initially didn't register from_scorer instantiates differently than directly via _CurveScorer. Not 100% on what is best here. [realised that method is nicer to avoid having too many params in decision_threshold_curve and avoids some lines of code (use self.xx directly, instead of passing self.xx to _scores_from_predictions)]

Any other comments?

cc @glemaitre

Lots more to do, but should get implementation right first.

To do:

Add tests
Add example
Review _decision_threshold.py module docstring

…/scikit-learn into metric_threshold_curve

…ssing TODOs

… important

github-actions · 2025-05-08T08:52:33Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 577ea24. Link to the linter CI: here}

lucyleeow · 2025-05-08T08:52:43Z

sklearn/metrics/_decision_threshold.py

+    labels: array-like, default=None
+        Class labels. If `None`, inferred from `y_true`.
+
+    pos_label : int, float, bool or str, default=None


The other main question is whether we need to allow the user to set this, even when scoring_function does not take a pos_label parameter?

I think we should have this, because pos_label is passed to _threshold_scores_to_class_labels, and a user should be able to control this.

lucyleeow · 2025-05-15T02:51:10Z

doc/modules/classification_threshold.rst

@@ -1,6 +1,6 @@
 .. currentmodule:: sklearn.model_selection

-.. _TunedThresholdClassifierCV:
+.. _threshold_tunning:


I referenced this page starting here for decision_threshold_curve because I thought this first paragraph was appropriate, for context, not just the section I added. Not 100% on this though and happy to change

lucyleeow · 2025-05-15T02:52:11Z

doc/modules/classification_threshold.rst

@@ -1,6 +1,6 @@
 .. currentmodule:: sklearn.model_selection

-.. _TunedThresholdClassifierCV:
+.. _threshold_tunning:

 ==================================================
 Tuning the decision threshold for class prediction


@glemaitre Can't comment where this is relevant (after L49), but I wonder if it would be interesting to add another scenario where threshold tunning may be of interest - imbalanced datasets?

lucyleeow · 2025-05-15T02:54:31Z

sklearn/metrics/_decision_threshold.py

+        good, or a los
F438
s function, meaning low is good. In the latter case, the
+        the output of `score_func` will be sign-flipped.
+
+    labels : array-like, default=None


_CurveScorer uses the term "classes" but "labels" is consistent with what is used for other classification metrics, so chose this.

lucyleeow · 2025-05-15T02:55:01Z

sklearn/metrics/_decision_threshold.py

+        between the minimum and maximum of `y_score`. If an array-like, it will be
+        used as the thresholds.
+
+    greater_is_better : bool, default=True


Again, this is consistent, with term used in other metrics, so avoided using "sign".

lucyleeow · 2025-05-15T02:56:13Z

sklearn/metrics/_scorer.py

+        # This could also be done in `decision_threshold_curve`, not sure which
+        # is better
+        y_true_unique = cached_unique(y_true)
+        if classes is None:
+            classes = y_true_unique
+        # not sure if this separate error msg needed.
+        # there is the possibility that set(classes) != set(y_true_unique) fails
+        # because `y_true` only contains one class.
+        if len(y_true_unique) == 1:
+            raise ValueError("`y_true` only contains one class label.")
+        if set(classes) != set(y_true_unique):
+            raise ValueError(
+                f"`classes` ({classes}) is not equal to the unique values found in "
+                f"`y_true` ({y_true_unique})."
+            )


These checks could be done in decision_threshold_curve instead, not sure which is better.

lucyleeow · 2025-05-15T02:57:19Z

sklearn/metrics/_scorer.py

-            for th in potential_thresholds
-        ]
-        return np.array(score_thresholds), potential_thresholds
+        # why 'potential' ?


Just for my education, why use the term "potential", in "potential_thresholds". Is it because there a possibility that a threshold is redundant because the predicted labels are the same for adjacent thresholds?

lucyleeow · 2025-05-15T03:01:06Z

8321

@glemaitre I've highlighted questions in the code.

Tests still to be added for decision_threshold_curve. Depending on where the validation checks are done, I don't think we need many value checks, as the function just uses _CurveScorer methods, which should be tested elsewhere.

vitaliset and others added 30 commits February 18, 2023 01:47

initial proposal with preliminary tests

172ac47

removing check that validate_params already does

d038e11

changelog and linting from CI

322eccf

trying to resolve doc related ci

7dbbec5

duplicate label

2a0c6b3

docstring example import error

fbb9b9b

docstring typo

acb94be

docstring typo

a5cd201

docstring typo

253b3e2

docstring typo

cb5fee1

change in doc order and typos

9e45e2e

removing example

ad901a2

Merge branch 'main' into metric_threshold_curve

1a4ce1b

Update import of _check_pos_label_consistency

9b4febb

codecov

119db53

Merge branch 'metric_threshold_curve' of https://github.com/vitaliset…

347f524

…/scikit-learn into metric_threshold_curve

linting

be893c8

correcting typo

bd1e64f

test typo

0318950

add example again to check pytest

efd6d72

Merge branch 'main' into metric_threshold_curve

1e500c0

Merge remote-tracking branch 'origin/main' into pr/vitaliset/25639

10ebc90

fixing imports

dfa66a5

towards glemaitre suggestions

1fb1c13

applying black suggestions

e7bb2a7

update extra stuff for consistency

5a8f0c5

removing doc files for now as we need to adapt to pr 29038

4fab2a3

Merge branch 'main' into metric_threshold_curve

48a0055

Merge branch 'main' into metric_threshold_curve

fbf1d2e

Merge branch 'main' into metric_threshold_curve

98873e6

vitaliset and others added 8 commits July 29, 2024 22:53

Update _decision_threshold.py to add authors

f1dc0e8

towards using _curvescorer in the new decision threshold function. mi…

0284251

…ssing TODOs

correcting circular dependences

d46bc1a

Merge branch 'main' into metric_threshold_curve

0a06199

trying to solve the circular imports. looks like the order of init is…

a424c3e

… important

merge main

09df5ae

first commit, original tests pass

bd256a8

min test to check func runs

a386ded

github-actions bot added module:metrics module:model_selection labels May 8, 2025

nits

eae5846

lucyleeow commented May 8, 2025

View reviewed changes

lucyleeow added 5 commits May 8, 2025 18:55

amend whats new;

5e7fd49

Merge branch 'main' into metric_threshold_curve

d23a25f

revert from scorer order

8efaaca

amend to method

6e0b5e0

fix param valid, use greater_is_better

0ac8d1d

glemaitre self-requested a review May 9, 2025 19:06

lucyleeow added 9 commits May 12, 2025 12:02

fix example

8a8e240

Merge branch 'main' into metric_threshold_curve

6715d6d

fix examples

bf887aa

rm pos label as param

edf99f0

typo

9c624c5

pos label fixes

7643942

fix kwargs

13c8545

add user guide section

b589505

change ref label

577ea24

lucyleeow commented May 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FEA Adds `decision_threshold_curve` function #31338

FEA Adds `decision_threshold_curve` function #31338

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FEA Adds decision_threshold_curve function #31338

Are you sure you want to change the base?

FEA Adds decision_threshold_curve function #31338

Conversation

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

✔️ Linting Passed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

FEA Adds `decision_threshold_curve` function #31338

FEA Adds `decision_threshold_curve` function #31338