8000 [MRG] Multi-class Brier Score Loss by aggvarun01 · Pull Request #18699 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG] Multi-class Brier Score Loss #18699

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

aggvarun01
Copy link
@aggvarun01 aggvarun01 commented Oct 28, 2020

Reference Issues/PRs

Resolves #16055

What does this implement/fix?

The original formulation for Brier score inherently supports multiclass classification source. This is currently absent in scikit-learn, which restricts Brier score to binary classification. This PR implements Brier Score for multi-class classification.

Notes/Open Questions

There are two different definitions of Brier Score. The one implemented in scikit-learn, which is only applicable for the binary case is:

(where y_hat is the probability of the positive class)

Whereas, the original, and more universal implementation, that is applicable for both the binary and the multi-class case is:

The range of values for the former is [0,1], whereas for the latter, it is [0,2]. For backwards compatibility, this PR uses the old definition for the binary case, and the new one for the multi-class. However, this implementation seems unintuitive, since the range of values changes. I can see the following workarounds:

  1. Leave the PR as is, and add this weird behavior to the docstring (possibly in all caps and bold 🤪)
  2. Slowly deprecate the 'old' formula, so that even in the binary case, the range of outputs is [0,2]. This has the risk of breaking backwards compatibility and possibly making a lot of people angry.
  3. Take inspiration from R library and have a separate multi-class brier score function.

While the current PR implements method 1., I personally lean towards method 3.

@ogrisel
Copy link
Member
ogrisel commented Oct 29, 2020
  1. Take inspiration from R library and have a separate multi-class brier score function.

I think this is the best course of action. Introduce a new multiclass_brier_score function with the general formulation and keep the existing implementation untouched for binary classification problems. Cross reference the 2 functions in each other's docstring.

Also when trying to call brier_score on multiclass y_true, make sure that the message in the raised ValueError directs the user to use multiclass_brier_score instead.

If the user decides to use multiclass_brier_score on binary problems, I think this is fine. No need to raise any exception or warning as using the [0, 2] range for binary problems can be considered legits.

@aggvarun01
Copy link
Author
aggvarun01 commented Nov 2, 2020

@ogrisel

  1. Added a new mutliclass_brier_score
  2. Cross-referenced the two functions in each other's docstrings
  3. Changed the error message for brier_score to point to multiclass_brier_score
  4. Added a bunch of tests

Should be ready for a review.

@ogrisel ogrisel changed the title [WIP] Multi-class Brier Score Loss [MRG] Multi-class Brier Score Loss Nov 2, 2020
Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bearing with me, here is another round of review comments:

the probabilities provided are assumed to be that of the
positive class. The labels in ``y_pred`` are assumed to be
ordered alphabetically, as done by
:class:`preprocessing.LabelBinarizer`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alphabetically => lexicographically.

I find it misleading that we do not respect the order implied by labels when passed. Maybe we should raise a ValueError or a warning when the users passes labels= that does not respect the lexicographical order.

Note that if we decide to something else w.r.t. the handling of the y_prob class order, we would have to update the log_loss metric accordingly.

Copy link
Author
@aggvarun01 aggvarun01 Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you correctly identified, this implementation of mutliclass_brier_score_loss shadows the one from log_loss. Any change we make will have to be made to both the functions.

However, both the functions use LabelBinarizer to infer labels, so it seems like any warning/error that we raise should be raised there. Right?

Copy link
Author
@aggvarun01 aggvarun01 Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, this is a misuse of LabelBinarizer.

We have the following options:

  1. Fix multiclass_brier_score so that it respects labels
  2. Keep multiclass_brier_score as is, but raise a warning/error

If we go with the latter, then we should do the same with log_loss as well. Or is that a different PR as well?

labels : array-like, default=None
If not provided, labels will be inferred from y_true. If ``labels``
is ``None`` and ``y_prob`` has shape (n_samples,) the labels are
assumed to be binary and are inferred from ``y_true``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to pass an explicit pos_label if y_true has object or string values. Similar to fbeta_score I believe. @glemaitre let me know if I miss something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then log_loss has the same problem... maybe for a subsequent PR then.

else:
raise ValueError(f'The number of classes in labels is different '
f'from that in y_prob. Classes found in '
f'labels: {lb.classes_}')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken, all this input validation code is duplicated from the log_loss function. Could you please factorize into a private helper method:

y_true, y_prob, labels = _validate_multiclass_probabilistic_prediction(y_true, y_prob, labels)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the convention here. Should I put this function in sklearn.metrics._base or in sklearn.metrics._classification?

[[0.2, 0.7, 0.1],
[0.6, 0.2, 0.2],
[0.6, 0.1, 0.3]]),
.41333333)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please split this test in 2 tests, one for invalid input and error messages and the other for expected numerical values of valid inputs?

Also could you please add a test that check that perfect predictions lead to 0 Brier and perfectly wrong prediction lead to a Brier of 2. (both for a 2-class and a 3-class classification problem).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

case as well. When used for the binary case, `multiclass_brier_score_loss`
returns Brier score that is exactly twice of the value returned by this
function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please recall the latex equation of the docstring here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied the equation for mutliclass_brier_score_loss as asked. But the docstring for brier_score_loss did not include the alternate (i.e. original) formula, so added that equation to the docstring as well.

aggvarun01 and others added 4 commits November 4, 2020 17:02
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@aggvarun01
Copy link
Author

@ogrisel

Apologize for the late reply.

I implemented the changes you suggested. However, there are still a few things that need a review/approval: :

  1. Re: here. I added a warning when the label order is different. I added this to the suggested _validate_multiclass_probabilistic_prediction function, so that this warning will be raised for log_loss as well.
  2. Leaving this for another PR unless you guys think otherwise.
  3. Added _validate_multiclass_probabilistic_prediction to sklearn.metrics._classification itself. Let me know if that's the best place to p CC6C ut it, or should I move it to sklearn.metrics._base.
  4. This test fails with the new implementation of log_loss, which only changes where np.clip is being used. Turns out this test was supposed to fail, and was passing merely because of a floating point error. The way I verified this was as follows:
from sklearn.metrics import log_loss

y_true = np.array([0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1])
y_pred = np.array([0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0])

def add_dimension(y_prob):
    y_prob = y_prob[:, np.newaxis]
    y_prob = np.append(1 - y_prob, y_prob, axis=1)
    return y_prob

# This is what the test is doing, which gives different results
print(log_loss(y_pred, y_true))  # 15.542609297195847
print(log_loss(y_true, y_pred))  # 15.542649277067358

# alternate way to run the same test, which gives the same results
print(log_loss(y_true, add_dimension(y_pred)))  # 15.542449377709806
print(log_loss(y_pred, add_dimension(y_true)))  # 15.542449377709806

I have removed log_loss from NOT_SYMMETRIC_METRICS for now. I don't know if symmetry even makes sense for log_loss, since it takes probabilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:metrics Superseded PR has been replace by a newer PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add multiclass support for brier_score_loss
4 participants
0