8000 fowlkes_mallows_score returns nan in binary classification · Issue #8101 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

fowlkes_mallows_score returns nan in binary classification #8101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
felix-last opened this issue Dec 22, 2016 · 23 comments · Fixed by #8140
Closed

fowlkes_mallows_score returns nan in binary classification #8101

felix-last opened this issue Dec 22, 2016 · 23 comments · Fixed by #8140
Labels
Bug Moderate Anything that requires some knowledge of conventions and best practices

Comments

@felix-last
Copy link
felix-last commented Dec 22, 2016

Description

fowlkes_mallows_score doesn't work properly for large binary classification vectors. It returns values that are not between 0 and 1 or returns nan. In general, the equation shown in the documentation doesn't yield the same results as the function.

Steps/Code to Reproduce

Edited by @jnothman: this reference implementation is incorrect. See comment below.

import sklearn
import numpy as np
def get_FMI(true,predicted):
    c = sklearn.metrics.confusion_matrix(true,predicted)
    TP = c[1][1]
    FP = c[0][1]
    FN = c[1][0]
    FMI = TP / np.sqrt((TP + FP) * (TP + FN))

    print('Should be', FMI)
    print('Is', sklearn.metrics.fowlkes_mallows_score(true, predicted))
    
# large vector
get_FMI(np.random.choice([0,1], 1362),np.random.choice([0,1], 1362))
# small vector
get_FMI(np.random.choice([0,1], 100),np.random.choice([0,1], 100))

Expected Results

Should be 0.487888392921
Is 0.487888392921

Should be 0.548853049023
Is 0.548853049023

Actual Results

Should be 0.487888392921
Is 15.3260054113

Should be 0.548853049023
Is 0.501109879279

Versions

Windows-10-10.0.10586-SP0
Python 3.5.2 |Anaconda custom (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
NumPy 1.11.2
SciPy 0.18.1
Scikit-Learn 0.18.1

@jnothman
Copy link
Member
jnothman commented Dec 22, 2016

The implementation looks quite strange and I'm not persuaded (despite reviewing cosmetic things in the original PR, where there were two other reviews) that the testing suite is sufficient, or even that the reference examples are correct. It's clear from the score > 1 that something's broken.

However, your claim of the correct values is ignoring the fact that these are true positives defined as pairs of points that co-clustered in both y_true and y_pred. You're also missing the fact that unlike the classification "confusion matrix" interpretation, clustering metrics need to be invariant to a permutation of labels. So you should get fmi(y_true, y_pred) == fmi(y_true, 1-y_pred) == fmi(1-y_true, y_pred) == fmi(1-y_true, 1-y_pred) for y_true and y_pred in (0,1).

TP should be calculated as (contingency * (contingency - 1)).sum() / 2. I'm not convinced that this is equivalent to what's currently in the code.

@felix-last
Copy link
Author

Thanks for your response. You're right, the code example above does not account for label permutations, I was only using this algorithm for classification performance evaluation.

@jnothman jnothman added Moderate Anything that requires some knowledge of conventions and best practices Need Contributor labels Dec 22, 2016
@jnothman
Copy link
Member

FMI isn't for classification, precisely because it handles label non-identification.

@jnothman
Copy link
Member

Though I think, ignoring the pairwise factor, the formula is more generally the geometric mean of precision and recall which is, I think, used on occasion (rarely; harmonic, i.e. f-score, and arithmetic means are more common) for classification.

@jnothman
Copy link
Member
jnothman commented Dec 22, 2016 via email

@felix-last
Copy link
Author

You're right, what I want in order to rate the classification is the geometric mean of precision and recall, so FMI isn't quite right. Anyways, found this bug along the way.

@jnothman
Copy link
Member

Thank you. I think it would be a good idea to have something like pairwise_confusion_matrix as a separate function to calculate pairwise TP, FP, FN, TN.

@devanshdalal
Copy link
Contributor

Hi, I want to work on this one if its still there.

@jnothman
Copy link
Member

It appears to be open. I await your PR.

@gan3sh500
Copy link

tk, pk and qk follow the same equation as the reference given in the documentation. The above code gave me an error on
tk / np.sqrt(pk * qk) if tk != 0. else 0.
which I could fix with
tk / np.sqrt(pk) / np.sqrt(qk) if tk != 0. else 0.

@devanshdalal
Copy link
Contributor
devanshdalal commented Dec 28, 2016

Hi @gan3sh500 , thanks for your response. But could you please let me handle this bug.

@devanshdalal
Copy link
Contributor
devanshdalal commented Dec 28, 2016

@jnothman ,I think reference, tk is actually (contingency * contingency).sum() - n => (contingency * (contingency - 1)).sum(). 1/2 is the common factor in numerator and denominator.

@jnothman
Copy link
Member

I can't really analyse these small remarks. Things get much more concrete with a pull request and test cases.

@devanshdalal
Copy link
Contributor

@jnothman , working on it. will create a pull request soon.

@devanshdalal
Copy link
Contributor

@felix-last , I am not being able to generate values that are not between 0 and 1 or nan with the existing fowlkes_mallows_score() function.

('Python', '2.7.6 (default, Oct 26 2016, 20:30:19) \n[GCC 4.8.4]')
('NumPy', '1.8.2')
('SciPy', '0.13.3')
('Scikit-Learn', '0.19.dev0')

@jnothman, The reference is calculating the same thing but in a different way(just go to starting of page 554 from proof).

Please correct me if you find something inconsistent about these facts.

@jnothman
Copy link
Member

i'm happy if the implementation is correct, and find that believable. It still needs more tests, IMO. It only seems to have one non-trivial test.

@devanshdalal
Copy link
Contributor
devanshdalal commented Dec 30, 2016

Perhaps @flex-last is giving arrays containing nan value(s) as input, I don't know exactly.

@felix-last
Copy link
Author

@devanshdalal, have you tried with two large input vectors like
np.random.choice([0,1], 1362)? There's no nan values in there, yet I receive the reported results, i.e. values > 1 or nan's.

@devanshdalal
Copy link
Contributor

I tried using your program only. I was unable to generate the any error you reported. Everything seemed perfect. We are using different configurations. Can it be the issue?

I am getting this.

('Should be', 0.48062324702807802)
('Is', 0.49969133899795842)
('Should be', 0.50009248127333317)
('Is', 0.49165034709585365)

@devanshdalal
Copy link
Contributor

@jnothman , I have just started here. Please direct me on where and which kind of tests u want to add, I will add those test cases.

@jnothman
Copy link
Member

I wonder if there are overflow errors happening in Windows leading to the NaN...?

@jnothman
Copy link
Member

@devanshdalal: I'd just hoped there was more than one unit test using a hand-crafted example and checking the result was correct. Indeed, a test for symmetry would also be appropriate.

@devanshdalal
Copy link
Contributor
devanshdalal commented Dec 31, 2016

@jnothman , Can it be a bug in Numpy functions we are using in windows? Well, I will try in my windows and see if the errors persist? I will add the tests in the next pull request for the issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Moderate Anything that requires some knowledge of conventions and best practices
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
0