Adding Fall-out, Miss rate, specificity as metrics #5516

languitar · 2015-10-21T15:55:36Z

It would be nice if these rates would be included in the metrics module:

False positive rate (Fall-out)
False negative rate (Miss rate)
True negative rate (specificity)

amueller · 2015-10-21T16:02:39Z

there is confusion_matrix

languitar · 2015-10-21T20:40:24Z

Right, I used this to implement on of these rates that I needed. But still, why shouldn't everyone be able to calculate these rates di 8000 rectly? Pulling the data from the unlabled confusion matrix is also error-prone.

jnothman · 2015-10-21T21:11:10Z

Would an example of getting these from the diagonal and marginals of the
confusion matrix suffice?

On 22 October 2015 at 07:40, Johannes Wienke notifications@github.com
wrote:

Right, I used this to implement on of these rates that I needed. But
still, why shouldn't everyone be able to calculate these rates directly?
Pulling the data from the unlabled confusion matrix is also error-prone.

—
Reply to this email directly or view it on GitHub
#5516 (comment)
.

languitar · 2015-10-23T07:47:55Z

Why not just add them as new metrics or loss functions?

amueller · 2015-10-23T15:51:14Z

usually you want to compute all four, right? Or do you want to use only a single one and select a model using FPR? That would seem a bit odd to me.

languitar · 2015-10-23T16:11:36Z

languitar commented

Oct 23, 2015

Right now I only need them for debugging purposes, so I don't select models at all using this. I wouldn't care much if there are 4 different methods or just a single one. However, the current metrics interface usually just computes a single score. So having four separate methods would be more consistent to my mind.

amueller · 2015-10-23T16:43:42Z

precision_recall_fscore_support computes three ;) Well we can add four new methods, or we could add a normalize option to the confusion matrix.

languitar · 2015-10-24T20:15:12Z

I would be against having to pull the numbers from the confusion matrix as their are no named entries in this matrix and it is easy to mix up the indices. So you end up with a metric you didn't intend to compute.

arjoly · 2015-10-24T21:35:17Z

Why not, those metrics could be added if there are good explanation on what they bring compare to what we already have.

languitar · 2015-10-25T13:22:01Z

For me there is a paper which used one of these rates and I want to compare against that.

josephdviviano · 2017-02-15T17:46:31Z

I agree with @languitar -- especially when trying to replicate old results that are reported. This sort of thing can be field-specific (e.g., I'm in neuroscience, where machine learning expertise is low compared to computer scientists, and many are forced to follow methods blindly to be consistent with previous work by reviewers).

dourouc05 · 2017-03-31T21:32:30Z

Trying to exploit a confusion matrix, I find this "old" issue. The problem with the confusion matrix is that finding your way through the indices is a nightmare, hence problems with the definitions of rates mentioned earlier.

I am dreaming of a useable interface for this matrix, something like:

confusion_matrix(actual, predicted).get(actual_class=2, predicted_class=3)

Then, it could be extended to define other metrics if necessary, for example in binary classification:

confusion_matrix(actual, predicted).specificity()

The major problem I see is that confusion_matrix returns a matrix, which means that there is very little room to expand the interface without breaking retrocompatibility. That means that some new syntax is needed (or is there a way to add methods to a matrix or to return an object that would automatically be casted as a matrix when needed?).

Of course, I propose myself as a contributor for this.

jnothman · 2017-04-02T02:49:58Z

for the binary case, we recently changed to documentation to make the tp, fp, fn, tn counts more explicit in an example. Does that help?

…

On 1 Apr 2017 8:32 am, "Thibaut Cuvelier" ***@***.***> wrote: Trying to exploit a confusion matrix, I find this "old" issue. The problem with the confusion matrix is that finding your way through the indices is a nightmare, hence problems with the definitions of rates mentioned earlier. I am dreaming of a useable interface for this matrix, something like: confusion_matrix(actual, predicted).get(actual_class=2, predicted_class=3) Then, it could be extended to define other metrics if necessary, for example in binary classification: confusion_matrix(actual, predicted).specificity() The major problem I see is that confusion_matrix returns a matrix, which means that there is very little room to expand the interface without breaking retrocompatibility. That means that some new syntax is needed (or is there a way to add methods to a matrix or to return an object that would automatically be casted as a matrix when needed?). Of course, I propose myself as a contributor for this. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#5516 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6_eNZsBDX13sxTls59ALQHMh6sesks5rrXDvgaJpZM4GTF4Z> .

dourouc05 · 2017-04-02T11:38:54Z

Indeed, that does help a bit, but not much more than this. (As a side note, it seems that there is a small syntax error in the documentation file, as that part of the documentation shows everything as a single line; a missing empty line before line 240?)

The main problem is that the syntax does not really make sense, there is nothing intuitive behind ravel: if you switch tp and tn, there is no way you can see something is wrong (unless you compare your code to the documentation or work out the code line by line with a debugger on a small example — supposing you know there is something wrong!).

On the other hand, if you write this:

true_negatives = confusion_matrix(actual, predicted).false_positives()
true_negatives = confusion_matrix(actual, predicted).get(actual_class=0, predicted_class=2)

then you can see quite directly that there is something wrong there. What is more, this kind of syntax could be easily extrapolated to multiclass metrics (for example, following this review article: http://rali.iro.umontreal.ca/rali/sites/default/files/publis/SokolovaLapalme-JIPM09.pdf).

jnothman · 2017-04-02T13:04:19Z

Thanks for pointing out the blank line issue. Fixed. I don't think you should expect an interface like your `get` one, even if I agree the code is more legible. Ultimately, we ascribe a particular convention for which axis is which and there's little more that we can offer while dealing with standard classes of object. Thanks for that paper from JIPM. I find some of the invariants they talk about useful, but I'm a bit surprised by some of their analysis. It is a pity they do not precisely define tp, fp, fn, tn in the multiclass case (I presume they're just using the standard OvR transform), and then that they describe invariants with respect to a binary contingency matrix rather than properties of the overall confusion matrix, and how class distributions interact. For example, you can't "exchange positives and negatives" in a multiclass case without effectively changing the distribution of *all* classes. Thus I find the fact that they copy the invariances from the binary case to the multiclass and hierarchical cases in Table 7 quite strange, as well as a waste of space. In practice, for instance, micro-averaged precision, recall and f-score are identical in the multiclass setting, unless you only define them with respect to a subset of classes. It's very difficult even reasoning about what those invariances mean in the multiclass case. But maybe I've missed something in my skim read. I could imagine a function which provided this kind of OvR-transformed tp, fp, fn, tn counts for each class. But it's really not hard to get from the marginals and diagonal of a confusion matrix, and again could be illustrated as an example.

…

On 2 April 2017 at 21:38, Thibaut Cuvelier ***@***.***> wrote: Indeed, that does help a bit, but not much more than this. (As a side note, it seems that there is a small syntax error in the documentation file, as that part of the documentation <http://scikit-learn.org/dev/modules/generated/sklearn.metrics.confusion_matrix.html#r204> shows everything as a single line; a missing empty line before line 240 <https://github.com/scikit-learn/scikit-learn/blob/c5ccb72/sklearn/metrics/classification.py#L240> ?) The main problem is that the syntax does not really make sense, there is nothing intuitive behind ravel: if you switch tp and tn, there is no way you can see something is wrong (unless you compare your code to the documentation or work out the code line by line with a debugger on a small example — supposing you know there is something wrong!). On the other hand, if you write this: true_negatives = confusion_matrix(actual, predicted).false_positives() true_negatives = confusion_matrix(actual, predicted).get(actual_class=0, predicted_class=2) then you can see quite directly that there is something wrong there. What is more, this kind of syntax could be easily extrapolated to multiclass metrics (for example, following this review article: http://rali.iro.umontreal.ca/rali/sites/default/files/ publis/SokolovaLapalme-JIPM09.pdf). — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#5516 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz68u2OdWvxo5YI3xAaOv2Z0rChP4hks5rr4jQgaJpZM4GTF4Z> .

glemaitre · 2017-04-02T17:30:24Z

To be honest, I did not see the issue before. Since it has been tagged and if this is still of interest to have an implementation of those metrics, we have implemented a sensitivity_specificity_support function in imblearn as well as the score functions.

I completely agree with @josephdviviano .
They are the standard metrics --- in conjunction with the AUC of the ROC --- that I saw in cancer prostate detection and melanoma detection. In these fields, the precision-recall are not used.

However, I am not sure that this is a good enough argument to implement them.

arjoly · 2017-04-13T15:49:11Z

I think they would make good additions. However special cares need to be taken in the documentation to highlight what they bring to the error measurement process.

erolrecep · 2017-06-05T17:33:09Z

I just realize that there is a relationship between specificity and sensitivity scores of two classes.
Classification report :
precision recall f1-score support

      0       1.00      0.95      0.97        57
      1       0.97      1.00      0.98        86

avg / total 0.98 0.98 0.98 143

Confusion matrix :
[[54 3]
[ 0 86]]

Confusion Matrix
[[TP FN
FP TN]]

Precision is (TP)/(TP + FP)
Sensitivity (Recall) is (TP)/(TP+FN)
Specificity is (TN)/(TN+FP)

If we think of

Precision for class 0 is 54 / 54 + 0 which is 1.
Precision for class 1 is it's reverse which is 86.0 / 86 + 3 equal to 0.97 -> (TN)/(TN+FN).

Specificity for class 0 is 86.0 / 86 + 0 which is 1.
Specificity for class 1 is 54.0 / 54 + 3 which is 0.95.

Sensitivity (Recall) for class 0 is 54.0 / 54+3 which is 0.95 -> the same as Specificity for class 1.
Sensitivity (Recall) for class 1 is 86.0 / 86 + 0 which is 1. -> (TN)/(TN+FP) which is the same of Specificity for class 0.

So, the classification report already gives us the Specificity scores for both classes which are reverse of each class's Recall values.

amueller · 2017-06-06T11:58:11Z

I'm happy to add additional metrics, but I don't think we should add alias functions for say, sensitivity.

A reasonable implementation of
confusion_matrix(actual, predicted).get(actual_class=0, predicted_class=2)
would be
get_confusion(confusion_matrix(actual, predicted), actual_class=0, predicted_class=2)
but I'm not sure that adds enough value. It's basically naming the axis of the matrix.

I'm not sure what confusion_matrix(actual, predicted).false_positives() is supposed to do.
We could add false_positive_rate with a parameter for the positive class, which would be OvR, but is that standard? We could also have these functions but restrict them to the binary case?

jnothman · 2018-01-29T12:33:49Z

I think it's worth defining functions for each of these, or at least scorers. That would be the simplest API-consistent way to do it, I think, and users do benefit from being able to find these names, certainly now that we support multiple scorers for CV/gridsearch.

haochunchang · 2020-05-20T14:40:25Z

take

madprogramer · 2020-09-05T09:45:46Z

Is this still an issue, or is the updated more verbose output for confusion_matrixs satisfactory now? If so could someone please close this issue?

glemaitre · 2020-09-07T08:52:51Z

No this is still an opened issue. I see that we did not got any bandwidth to review #17265

vaibhavmehrotraml · 2020-12-26T15:37:15Z

I want to take this up.
Going to start reviewing #17265

BatMrE · 2021-08-25T19:37:45Z

Hi, if its still open, and something specific need to be done can I work on it by creating a new PR??

Bhavya1705 · 2021-08-28T20:07:01Z

Hello, I have started working on this.

Bhavya1705 · 2021-08-31T06:21:51Z

Please if see the above pull request is satisfactory to your needs - I have added separate functions for Miss rate, fall out, Specificity and sensitivity.

LawJarp-A · 2021-12-14T03:14:13Z

Hello, is this issue still open to work on?

josephdviviano · 2022-02-17T01:35:05Z

Looks like this feature is awaiting approval here: #19556

arjoly changed the title ~~Please add FP, TN, FN rates as metrics~~ Adding Fall-out, Miss rate, specificity rates as metrics Oct 24, 2015

arjoly changed the title ~~Adding Fall-out, Miss rate, specificity rates as metrics~~ Adding Fall-out, Miss rate, specificity as metrics Oct 24, 2015

jnothman added Enhancement Need Contributor labels Feb 15, 2017

amueller added the Sprint label Mar 3, 2017

lesteve added help wanted and removed Need Contributor labels Oct 18, 2017

jnothman added good first issue Easy with clear instructions to resolve and removed Sprint labels Jan 29, 2018

jnothman mentioned this issue Feb 12, 2018

[WIP] ENH Multilabel confusion matrix #10628

Closed

8 tasks

github-actions bot assigned haochunchang May 20, 2020

github-actions bot removed the help wanted label May 20, 2020

jnothman mentioned this issue Dec 23, 2020

Request for 'specificity score' for multiclass problems #19061

Closed

glemaitre mentioned this issue Jan 5, 2021

Confusion Matrix Representation / Return Value #19012

Open

Pawel-Kranzberg linked a pull request Feb 25, 2021 that will close this issue

FEA Confusion matrix derived metri 6D40 cs #19556

Open

This was referenced Aug 27, 2021

Bhavya sklearn #20861

Closed

Adding Fall-out, Miss rate, specificity as metrics #20875

Closed

This was referenced Aug 28, 2021

Adding Fall-out, Miss rate, specificity as metrics #5516 #20878

Closed

Adding Fall-out, Miss rate, specificity as metrics #5516 #20879

Closed

Adding Fall-out, Miss rate, specificity as metrics #20889

Closed

glemaitre mentioned this issue Nov 7, 2021

Calculate Sensitive and Specificity from the confusion matrix #21587

Closed

lorentzenchr mentioned this issue Nov 19, 2021

RFC Principled metrics for scoring and calibration of supervised learning #21718

Open

This was referenced Dec 2, 2021

Added Fall out , miss rate , specificity #21854

Closed

Adding Fall out , miss rate , specificity #21856

Closed

Mod3 #21857

Closed

cmarmo added the module:metrics label Dec 9, 2021

haochunchang removed their assignment Dec 14, 2021

github-actions bot added the help wanted label Dec 14, 2021

thomasjpfan removed the good first issue Easy with clear instructions to resolve label Jan 6, 2022

thomasjpfan mentioned this issue Jan 14, 2022

Would it make sense to add False Positive Rate as a metric? #10391

Closed

cmarmo removed the help wanted label Feb 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Fall-out, Miss rate, specificity as metrics #5516

Adding Fall-out, Miss rate, specificity as metrics #5516

Adding Fall-out, Miss rate, specificity as metrics #5516

Adding Fall-out, Miss rate, specificity as metrics #5516

Comments