[MRG] Multi-class Brier Score Loss #18699

aggvarun01 · 2020-10-28T14:36:40Z

Reference Issues/PRs

Resolves #16055

What does this implement/fix?

The original formulation for Brier score inherently supports multiclass classification source. This is currently absent in scikit-learn, which restricts Brier score to binary classification. This PR implements Brier Score for multi-class classification.

Notes/Open Questions

There are two different definitions of Brier Score. The one implemented in scikit-learn, which is only applicable for the binary case is:

(where y_hat is the probability of the positive class)

Whereas, the original, and more universal implementation, that is applicable for both the binary and the multi-class case is:

The range of values for the former is [0,1], whereas for the latter, it is [0,2]. For backwards compatibility, this PR uses the old definition for the binary case, and the new one for the multi-class. However, this implementation seems unintuitive, since the range of values changes. I can see the following workarounds:

Leave the PR as is, and add this weird behavior to the docstring (possibly in all caps and bold 🤪)
Slowly deprecate the 'old' formula, so that even in the binary case, the range of outputs is [0,2]. This has the risk of breaking backwards compatibility and possibly making a lot of people angry.
Take inspiration from R library and have a separate multi-class brier score function.

While the current PR implements method 1., I personally lean towards method 3.

ogrisel · 2020-10-29T14:37:01Z

Take inspiration from R library and have a separate multi-class brier score function.

I think this is the best course of action. Introduce a new multiclass_brier_score function with the general formulation and keep the existing implementation untouched for binary classification problems. Cross reference the 2 functions in each other's docstring.

Also when trying to call brier_score on multiclass y_true, make sure that the message in the raised ValueError directs the user to use multiclass_brier_score instead.

If the user decides to use multiclass_brier_score on binary problems, I think this is fine. No need to raise any exception or warning as using the [0, 2] range for binary problems can be considered legits.

…score_loss

aggvarun01 · 2020-11-02T20:02:37Z

@ogrisel

Added a new mutliclass_brier_score
Cross-referenced the two functions in each other's docstrings
Changed the error message for brier_score to point to multiclass_brier_score
Added a bunch of tests

Should be ready for a review.

ogrisel

Thanks for bearing with me, here is another round of review comments:

sklearn/metrics/_classification.py

ogrisel · 2020-11-02T21:41:14Z

sklearn/metrics/_classification.py

+        the probabilities provided are assumed to be that of the
+        positive class. The labels in ``y_pred`` are assumed to be
+        ordered alphabetically, as done by
+        :class:`preprocessing.LabelBinarizer`.


alphabetically => lexicographically.

I find it misleading that we do not respect the order implied by labels when passed. Maybe we should raise a ValueError or a warning when the users passes labels= that does not respect the lexicographical order.

Note that if we decide to something else w.r.t. the handling of the y_prob class order, we would have to update the log_loss metric accordingly.

As you correctly identified, this implementation of mutliclass_brier_score_loss shadows the one from log_loss. Any change we make will have to be made to both the functions.

However, both the functions use LabelBinarizer to infer labels, so it seems like any warning/error that we raise should be raised there. Right?

On second thought, this is a misuse of LabelBinarizer.

We have the following options:

Fix multiclass_brier_score so that it respects labels

Keep multiclass_brier_score as is, but raise a warning/error

If we go with the latter, then we should do the same with log_loss as well. Or is that a different PR as well?

ogrisel · 2020-11-02T21:44:58Z

sklearn/metrics/_classification.py

+    labels : array-like, default=None
+        If not provided, labels will be inferred from y_true. If ``labels``
+        is ``None`` and ``y_prob`` has shape (n_samples,) the labels are
+        assumed to be binary and are inferred from ``y_true``.


We need to pass an explicit pos_label if y_true has object or string values. Similar to fbeta_score I believe. @glemaitre let me know if I miss something.

But then log_loss has the same problem... maybe for a subsequent PR then.

ogrisel · 2020-11-02T21:57:53Z

sklearn/metrics/_classification.py

+        else:
+            raise ValueError(f'The number of classes in labels is different '
+                             f'from that in y_prob. Classes found in '
+                             f'labels: {lb.classes_}')


If I am not mistaken, all this input validation code is duplicated from the log_loss function. Could you please factorize into a private helper method:

y_true, y_prob, labels = _validate_multiclass_probabilistic_prediction(y_true, y_prob, labels)

Not sure about the convention here. Should I put this function in sklearn.metrics._base or in sklearn.metrics._classification?

ogrisel · 2020-11-02T22:01:24Z

sklearn/metrics/tests/test_classification.py

+                                    [[0.2, 0.7, 0.1],
+                                     [0.6, 0.2, 0.2],
+                                     [0.6, 0.1, 0.3]]),
+        .41333333)


Could you please split this test in 2 tests, one for invalid input and error messages and the other for expected numerical values of valid inputs?

Also could you please add a test that check that perfect predictions lead to 0 Brier and perfectly wrong prediction lead to a Brier of 2. (both for a 2-class and a 3-class classification problem).

ogrisel · 2020-11-02T22:02:13Z

sklearn/metrics/_classification.py

+    case as well. When used for the binary case, `multiclass_brier_score_loss`
+    returns Brier score that is exactly twice of the value returned by this
+    function.
+


Please recall the latex equation of the docstring here.

I copied the equation for mutliclass_brier_score_loss as asked. But the docstring for brier_score_loss did not include the alternate (i.e. original) formula, so added that equation to the docstring as well.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

aggvarun01 · 2020-11-11T22:36:54Z

@ogrisel

Apologize for the late reply.

I implemented the changes you suggested. However, there are still a few things that need a review/approval: :

Re: here. I added a warning when the label order is different. I added this to the suggested _validate_multiclass_probabilistic_prediction function, so that this warning will be raised for log_loss as well.
Leaving this for another PR unless you guys think otherwise.
Added _validate_multiclass_probabilistic_prediction to sklearn.metrics._classification itself. Let me know if that's the best place to put it, or should I move it to sklearn.metrics._base.
This test fails with the new implementation of log_loss, which only changes where np.clip is being used. Turns out this test was supposed to fail, and was passing merely because of a floating point error. The way I verified this was as follows:

from sklearn.metrics import log_loss

y_true = np.array([0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1])
y_pred = np.array([0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0])

def add_dimension(y_prob):
    y_prob = y_prob[:, np.newaxis]
    y_prob = np.append(1 - y_prob, y_prob, axis=1)
    return y_prob

# This is what the test is doing, which gives different results
print(log_loss(y_pred, y_true))  # 15.542609297195847
print(log_loss(y_true, y_pred))  # 15.542649277067358

# alternate way to run the same test, which gives the same results
print(log_loss(y_true, add_dimension(y_pred)))  # 15.542449377709806
print(log_loss(y_pred, add_dimension(y_true)))  # 15.542449377709806

I have removed log_loss from NOT_SYMMETRIC_METRICS for now. I don't know if symmetry even makes sense for log_loss, since it takes probabilities.

Varun Aggarwal added 8 commits October 25, 2020 20:47

add multi-class support

f630718

fix swapped y_true y_prob

e08d4f4

fix docstring

eff8854

fix docstring

d864395

fix variable name spelling

32ab60a

add tests

6e73c0d

merge upstream

7ce3f85

import re

9cd4247

github-actions bot added the module:metrics label Oct 28, 2020

Varun Aggarwal added 4 commits October 28, 2020 10:40

fix docstring

1369945

fix linting

a183d06

fix linting

08688d3

remove unused import

4f8a5f2

Varun Aggarwal added 9 commits November 1, 2020 23:04

add multiclass_brier_score_loss

7b51433

add tests

d5c90bf

fix docstring

2243828

Merge remote-tracking branch 'upstream/master' into multiclass_brier_…

9893101

…score_loss

use f-strings

3e4465f

fix tests

eafda42

fix error message

038abf7

fix docstring

838f827

fix linting

5ef41c7

ogrisel changed the title ~~[WIP] Multi-class Brier Score Loss~~ [MRG] Multi-class Brier Score Loss Nov 2, 2020

ogrisel reviewed Nov 2, 2020

View reviewed changes

aggvarun01 and others added 4 commits November 4, 2020 17:02

Update sklearn/metrics/_classification.py

4fb4c4f

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Apply suggestions from code review

3260bf3

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

split tests

86d793e

add private function

411ec1a

add warning for labels

f84493c

KaikeWesleyReis mentioned this pull request Dec 7, 2020

Implement binary/multiclass classification metric - Spherical Payoff #18970

Open

Base automatically changed from master to main January 22, 2021 10:53

cmarmo added the Waiting for Reviewer label Apr 22, 2021

ogrisel mentioned this pull request Dec 21, 2021

ENH Add Multiclass Brier Score Loss #22046

Merged

cmarmo added Superseded PR has been replace by a newer PR and removed Waiting for Reviewer labels Mar 29, 2022

lorentzenchr closed this Jun 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] Multi-class Brier Score Loss #18699

[MRG] Multi-class Brier Score Loss #18699

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[MRG] Multi-class Brier Score Loss #18699

[MRG] Multi-class Brier Score Loss #18699

Uh oh!

Conversation

Uh oh!

Reference Issues/PRs

What does this implement/fix?

Notes/Open Questions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!