Closed
Description
Description
fowlkes_mallows_score doesn't work properly for large binary classification vectors. It returns values that are not between 0 and 1 or returns nan
. In general, the equation shown in the documentation doesn't yield the same results as the function.
Steps/Code to Reproduce
Edited by @jnothman: this reference implementation is incorrect. See comment below.
import sklearn
import numpy as np
def get_FMI(true,predicted):
c = sklearn.metrics.confusion_matrix(true,predicted)
TP = c[1][1]
FP = c[0][1]
FN = c[1][0]
FMI = TP / np.sqrt((TP + FP) * (TP + FN))
print('Should be', FMI)
print('Is', sklearn.metrics.fowlkes_mallows_score(true, predicted))
# large vector
get_FMI(np.random.choice([0,1], 1362),np.random.choice([0,1], 1362))
# small vector
get_FMI(np.random.choice([0,1], 100),np.random.choice([0,1], 100))
Expected Results
Should be 0.487888392921
Is 0.487888392921
Should be 0.548853049023
Is 0.548853049023
Actual Results
Should be 0.487888392921
Is 15.3260054113
Should be 0.548853049023
Is 0.501109879279
Versions
Windows-10-10.0.10586-SP0
Python 3.5.2 |Anaconda custom (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
NumPy 1.11.2
SciPy 0.18.1
Scikit-Learn 0.18.1