8000 Add multiclass support for brier_score_loss · Issue #16055 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Add multiclass support for brier_score_loss #16055

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aggvarun01 opened this issue Jan 8, 2020 · 4 comments · Fixed by #22046
Closed

Add multiclass support for brier_score_loss #16055

aggvarun01 opened this issue Jan 8, 2020 · 4 comments · Fixed by #22046

Comments

@aggvarun01
Copy link

Feature Request: The original formulation for Brier score inherently supports multiclass classification [source]. This is currently absent in scikit-learn, which restricts Brier score to binary classification.

I have found it useful to have multiclass brier_score_loss when comparing calibration, and have a custom function for the same. I would be happy to work on this and port the code to the scikit-learn API.

@ogrisel
Copy link
Member
ogrisel commented Sep 9, 2020

Yes I agree. Would you be interested in contributing a PR? Note that for multiclass problems, the user should not have to pass a pos_label param, even we have string labels.

Related to #18183 and that might also impact what we are doing in #18114.

@cmarmo cmarmo removed the Needs Info label Sep 21, 2020
@aggvarun01
Copy link
Author
aggvarun01 commented Sep 28, 2020

The version of Brier Score currently implemented in scikit-learn is:

(where y_hat is the probability of the positive class)

This can be replaced by the multi-class formula, which is valid for the binary case as well:

Unless we use the new formula, we still need 'pos_label' for the binary case. However, if the new formula is used for the binary case, it won't be backwards compatible, since when the new formula is used for the binary case, it is twice the value obtained from the old formula. This also means that the range of Brier Score will no longer be between 0 and 1 for the multiclass case.

The workaround I can think of is to keep the binary case as is, and then implement the new formula for the multi-class case which can ignore 'pos_label'. The only downside to this approach is that users can expect to see a sudden jump in brier score when going from 2 classes to 3. However, I don't expect users to compare Brier Scores across models with different number of classes, so I think this should be fine.

The second workaround, similar to the 'measures' package in R, is to have a separate function called multiclass_brier.

Will be very happy to implement this.

[EDITED]

@CarlaFernandez
Copy link

Hello @ogrisel, is this still not merged? It would be really useful for my current use case. Thank you!

@lorentzenchr lorentzenchr changed the title [Feature Request/Improvement] Multiclass support for brier_score_loss Add multiclass support for brier_score_loss Mar 26, 2023
@lorentzenchr
Copy link
Member

See #22046 (comment) for a decision that was made on how to implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
0