-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Support for multi-class roc_auc scores #3298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am not certain what that means. Do you have a reference for it? On 19 June 2014 09:51, Madison May notifications@github.com wrote:
|
Here's a pretty decent explanation, along with references: https://www.cs.bris.ac.uk/~flach/ICML04tutorial/ROCtutorialPartIII.pdf |
hmm, what is a recommended scorer while multi-class auc is not implemented? |
Are you talking about what those slides consider an approximation to volume under surface in which the frequency-weighted average of AUC for each class is taken? This would seem to be identical to using the current Otherwise, those slides, and most references I can find to "multi-class ROC", focus on multi-class calibration of OvR, not on an evaluation metric. Is this what you're interested in? I have no idea how widespread this technique is, whether it's worth having such available in scikit-learn, and whether the greedy optimisation should be improved upon. |
Whenever one class is missing from y_true, it's not possible to compute the score. I didn't want to add the magic for the class inference and got users into troubles. |
It's possible that we're not dealing appropriately in the case of y_pred On 1 August 2014 17:08, Arnaud Joly notifications@github.com wrote:
|
it could perhaps be similar to the R function from the pROC package |
Hi, I implemented a draft of the macro-averaged ROC/AUC score, but I am unsure if it will fit for sklearn. Here is the code:
Could it be as simple as this? |
@fbrundu if this is the standard meaning. It is certainly one possible interpretation. |
There is a nice summary here: The version of Hand and Till seems to be generally accepted and I vote we implement that. TLDR: PR for Hand and Till welcome. Optionally Provost and Domingos with option to change the averaging. |
Hi, has there been any progress on implementing this? |
@joaquinvanschoren R uses the Hand and Till. I'd prefer that one, too. I have a student that will work on this soon. |
@amueller I can work on this :) |
@kchen17 thanks! |
We discussed this at OpenML quite a bit. For multiclass AUC there are no guarantees that one approach (macro-averaging, micro-averaging, weighted averaging, ...) is better than the other. In R you can find at least 5 different approaches (all also available in MLR now). |
I'm happy to have multiple versions. non-weighted and "not taking label imbalance into account" are two different things ;) Do you have a list and references? What's micro-averaging in this case? |
Note that we already micro- and macro-averaged ROC AUC for multiclass problems implemented in this example: http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#multiclass-settings |
Actually, I think that documentation is incorrect and should say On 26 September 2016 at 23:16, Olivier Grisel notifications@github.com
|
In micro-averaging, your true positive rate (TPR) is calculated by taking the sum of all TPs of all the classes, and dividing by the sum of all TPs and FNs of all the classes, i.e. for a 3-class problem: Example confusion matrix: Macro averaging just computes the TPR for each class separately and averages them (weighted by the number of examples in that class or not): With the same example: Maybe this helps (this uses precision, but the idea is the same): I would personally never use an unweighted macro-average, but l'll see if I can find the papers that studied this. |
Paper: This is what is supported in R (with additional literature): |
Hi! I was able to start looking into this issue last week, and I wanted to post a quick update/some questions, just to make sure I am on the right track.
Let me know what corrections/suggestions you may have! |
Ordinarily, we would avoid creating another function if reusing One key thing you should be thinking about is how to test these changes, including changing the traits of |
|
@joaquinvanschoren interestingly that paper didn't discuss any of the multi-class AUC papers mentioned above, in particular not the Fawcett paper from 2005.... hm I guess it's a renormalization of the 1-vs-1 multi-class? |
so currently we only have multi-label, and so we want to add multi-class with 1vs1 and 1vsRest and they each have weighted and unweighted variants. So... I propose we add a parameter @arjoly so The problem with that is that to make the hand-till measure default we'd have to do weighted average OvO and we can't really change the weighting option. So maybe we do OVR by default and explain in the narrative that OvO with weighting is also a good choice and add a reference? |
The summary of the paper @joaquinvanschoren cited also says that all the AUC versions give pretty much the same results. |
@amueller: Had a chance to read your comment again, and I'm a little confused about this part:
I was going to modify the |
Sorry, I was expecting you to implement both |
@amueller: Noted and that will be incorporated as well! Also wanted to ask: is there any advice on how to detect the difference between multiclass and multilabel? At first, I was just checking the dimensions of |
Multilabel means that multiple labels are predicted at once: you get a Sometimes, people solve the multiclass case by binarizing the output, hence
|
Hi, I hope type_of_target could solve the purpose of differentiating between |
using |
type_of_target is fine to distinguish between the y_trues, @amueller On 9 October 2016 at 05:18, Andreas Mueller notifications@github.com
|
Hi all, I just wanted to let you know that I submitted a "preliminary" PR. I am interested in hearing some feedback on implementation (e.g. I'm sure there are ways to leverage numpy/etc. in a better way than I am doing right now), along with best practices for adding new tests, documentation wording, etc. Thank you for all of the help so far! |
Any progress on adding multiclass support for AUC? |
@joaquinvanschoren: working on revisions after a code review by @jnothman in #7663. Will likely submit another update on that next week when I've finished with midterms |
Hi @kathyxchen, @jnothman, Any updates on the PR? |
Just checking in to see if there is any progress on adding multiclass support for AUC? |
we have trouble determining what is both an accepted and principled
formulation of ROC AUC for multiclass. See
#7663 (comment)
and below.
|
So fellows. Is there any progress with multiclass auc score? I found very confusing official documentation code with iris dataset. Because this method shows that my model predicts random numbers fairly well. |
This is almost done, we need to decide on a API detail before merging: #12789 (comment) |
@trendsearcher can you provide an example please? It's now merged but I'd like to see the issue you experienced. |
Glad to help. How can I give an example (it has lots of code and may be not
intuitive)? Maybe I can write it in plain text?
чт, 18 июл. 2019 г. в 00:35, Andreas Mueller <notifications@github.com>:
… @trendsearcher <https://github.com/trendsearcher> can you provide an
example please? It's now merged but I'd like to see the issue you
experienced.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3298?email_source=notifications&email_token=AKS7QOFYRQY7RZJBWUVVJSTP76GDFA5CNFSM4AQXHOO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2GU7EI#issuecomment-512577425>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKS7QOFQ5LAIZ2ZBR4M4EATP76GDFANCNFSM4AQXHOOQ>
.
|
@fbrundu Thank you for sharing! I tried your code. But when I call this function, I meet a problem saying "Multioutput target data is not supported with label binarization". Then I remove the code "pred=lb.transform(pred)" in the function. However, I meet another problem that "Found input variables with inconsistent numbers of samples: [198, 4284]". May I ask if you could help me solve this? Thank you! |
you have to use predict instead of predict_proba |
@fbrundu is your implementation correct? I use it and works. |
@Junting-Wang thanks for your implementation and it works. |
Low priority feature request: support for multi-class roc_auc score calculation in
sklearn.metrics
using the one against all methodology would be incredibly useful.The text was updated successfully, but these errors were encountered: