-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Adding Fall-out, Miss rate, specificity as metrics #5516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
there is |
Right, I used this to implement on of these rates that I needed. But still, why shouldn't everyone be able to calculate these rates di 8000 rectly? Pulling the data from the unlabled confusion matrix is also error-prone. |
Would an example of getting these from the diagonal and marginals of the On 22 October 2015 at 07:40, Johannes Wienke notifications@github.com
|
Why not just add them as new metrics or loss functions? |
usually you want to compute all four, right? Or do you want to use only a single one and select a model using FPR? That would seem a bit odd to me. |
Right now I only need them for debugging purposes, so I don't select models at all using this. I wouldn't care much if there are 4 different methods or just a single one. However, the current metrics interface usually just computes a single score. So having four separate methods would be more consistent to my mind. |
|
I would be against having to pull the numbers from the confusion matrix as their are no named entries in this matrix and it is easy to mix up the indices. So you end up with a metric you didn't intend to compute. |
Why not, those metrics could be added if there are good explanation on what they bring compare to what we already have. |
For me there is a paper which used one of these rates and I want to compare against that. |
I agree with @languitar -- especially when trying to replicate old results that are reported. This sort of thing can be field-specific (e.g., I'm in neuroscience, where machine learning expertise is low compared to computer scientists, and many are forced to follow methods blindly to be consistent with previous work by reviewers). |
Trying to exploit a confusion matrix, I find this "old" issue. The problem with the confusion matrix is that finding your way through the indices is a nightmare, hence problems with the definitions of rates mentioned earlier. I am dreaming of a useable interface for this matrix, something like:
Then, it could be extended to define other metrics if necessary, for example in binary classification:
The major problem I see is that Of course, I propose myself as a contributor for this. |
for the binary case, we recently changed to documentation to make the tp,
fp, fn, tn counts more explicit in an example. Does that help?
…On 1 Apr 2017 8:32 am, "Thibaut Cuvelier" ***@***.***> wrote:
Trying to exploit a confusion matrix, I find this "old" issue. The problem
with the confusion matrix is that finding your way through the indices is a
nightmare, hence problems with the definitions of rates mentioned earlier.
I am dreaming of a useable interface for this matrix, something like:
confusion_matrix(actual, predicted).get(actual_class=2, predicted_class=3)
Then, it could be extended to define other metrics if necessary, for
example in binary classification:
confusion_matrix(actual, predicted).specificity()
The major problem I see is that confusion_matrix returns a matrix, which
means that there is very little room to expand the interface without
breaking retrocompatibility. That means that some new syntax is needed (or
is there a way to add methods to a matrix or to return an object that would
automatically be casted as a matrix when needed?).
Of course, I propose myself as a contributor for this.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#5516 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6_eNZsBDX13sxTls59ALQHMh6sesks5rrXDvgaJpZM4GTF4Z>
.
|
Indeed, that does help a bit, but not much more than this. (As a side note, it seems that there is a small syntax error in the documentation file, as that part of the documentation shows everything as a single line; a missing empty line before line 240?) The main problem is that the syntax does not really make sense, there is nothing intuitive behind On the other hand, if you write this:
then you can see quite directly that there is something wrong there. What is more, this kind of syntax could be easily extrapolated to multiclass metrics (for example, following this review article: http://rali.iro.umontreal.ca/rali/sites/default/files/publis/SokolovaLapalme-JIPM09.pdf). |
Thanks for pointing out the blank line issue. Fixed.
I don't think you should expect an interface like your `get` one, even if I
agree the code is more legible. Ultimately, we ascribe a particular
convention for which axis is which and there's little more that we can
offer while dealing with standard classes of object.
Thanks for that paper from JIPM. I find some of the invariants they talk
about useful, but I'm a bit surprised by some of their analysis. It is a
pity they do not precisely define tp, fp, fn, tn in the multiclass case (I
presume they're just using the standard OvR transform), and then that they
describe invariants with respect to a binary contingency matrix rather than
properties of the overall confusion matrix, and how class distributions
interact. For example, you can't "exchange positives and negatives" in a
multiclass case without effectively changing the distribution of *all*
classes. Thus I find the fact that they copy the invariances from the
binary case to the multiclass and hierarchical cases in Table 7 quite
strange, as well as a waste of space. In practice, for instance,
micro-averaged precision, recall and f-score are identical in the
multiclass setting, unless you only define them with respect to a subset of
classes. It's very difficult even reasoning about what those invariances
mean in the multiclass case. But maybe I've missed something in my skim
read.
I could imagine a function which provided this kind of OvR-transformed tp,
fp, fn, tn counts for each class. But it's really not hard to get from the
marginals and diagonal of a confusion matrix, and again could be
illustrated as an example.
…On 2 April 2017 at 21:38, Thibaut Cuvelier ***@***.***> wrote:
Indeed, that does help a bit, but not much more than this. (As a side
note, it seems that there is a small syntax error in the documentation
file, as that part of the documentation
<http://scikit-learn.org/dev/modules/generated/sklearn.metrics.confusion_matrix.html#r204>
shows everything as a single line; a missing empty line before line 240
<https://github.com/scikit-learn/scikit-learn/blob/c5ccb72/sklearn/metrics/classification.py#L240>
?)
The main problem is that the syntax does not really make sense, there is
nothing intuitive behind ravel: if you switch tp and tn, there is no way
you can see something is wrong (unless you compare your code to the
documentation or work out the code line by line with a debugger on a small
example — supposing you know there is something wrong!).
On the other hand, if you write this:
true_negatives = confusion_matrix(actual, predicted).false_positives()
true_negatives = confusion_matrix(actual, predicted).get(actual_class=0, predicted_class=2)
then you can see quite directly that there is something wrong there. What
is more, this kind of syntax could be easily extrapolated to multiclass
metrics (for example, following this review article:
http://rali.iro.umontreal.ca/rali/sites/default/files/
publis/SokolovaLapalme-JIPM09.pdf).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#5516 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz68u2OdWvxo5YI3xAaOv2Z0rChP4hks5rr4jQgaJpZM4GTF4Z>
.
|
To be honest, I did not see the issue before. Since it has been tagged and if this is still of interest to have an implementation of those metrics, we have implemented a I completely agree with @josephdviviano . However, I am not sure that this is a good enough argument to implement them. |
I think they would make good additions. However special cares need to be taken in the documentation to highlight what they bring to the error measurement process. |
I just realize that there is a relationship between specificity and sensitivity scores of two classes.
avg / total 0.98 0.98 0.98 143 Confusion matrix : Confusion Matrix Precision is (TP)/(TP + FP) If we think of Precision for class 0 is 54 / 54 + 0 which is 1. Specificity for class 0 is 86.0 / 86 + 0 which is 1. Sensitivity (Recall) for class 0 is 54.0 / 54+3 which is 0.95 -> the same as Specificity for class 1. So, the classification report already gives us the Specificity scores for both classes which are reverse of each class's Recall values. |
I'm happy to add additional metrics, but I don't think we should add alias functions for say, sensitivity. A reasonable implementation of I'm not sure what |
I think it's worth defining functions for each of these, or at least scorers. That would be the simplest API-consistent way to do it, I think, and users do benefit from being able to find these names, certainly now that we support multiple scorers for CV/gridsearch. |
take |
Is this still an issue, or is the updated more verbose output for |
No this is still an opened issue. I see that we did not got any bandwidth to review #17265 |
I want to take this up. |
Hi, if its still open, and something specific need to be done can I work on it by creating a new PR?? |
Hello, I have started working on this. |
Please if see the above pull request is satisfactory to your needs - I have added separate functions for Miss rate, fall out, Specificity and sensitivity. |
Hello, is this issue still open to work on? |
Looks like this feature is awaiting approval here: #19556 |
It would be nice if these rates would be included in the metrics module:
The text was updated successfully, but these errors were encountered: