8000 fowlkes_mallows_score returns nan in binary classification · Issue #8101 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
fowlkes_mallows_score returns nan in binary classification #8101
Closed
@felix-last

Description

@felix-last

Description

fowlkes_mallows_score doesn't work properly for large binary classification vectors. It returns values that are not between 0 and 1 or returns nan. In general, the equation shown in the documentation doesn't yield the same results as the function.

Steps/Code to Reproduce

Edited by @jnothman: this reference implementation is incorrect. See comment below.

import sklearn
import numpy as np
def get_FMI(true,predicted):
    c = sklearn.metrics.confusion_matrix(true,predicted)
    TP = c[1][1]
    FP = c[0][1]
    FN = c[1][0]
    FMI = TP / np.sqrt((TP + FP) * (TP + FN))

    print('Should be', FMI)
    print('Is', sklearn.metrics.fowlkes_mallows_score(true, predicted))
    
# large vector
get_FMI(np.random.choice([0,1], 1362),np.random.choice([0,1], 1362))
# small vector
get_FMI(np.random.choice([0,1], 100),np.random.choice([0,1], 100))

Expected Results

Should be 0.487888392921
Is 0.487888392921

Should be 0.548853049023
Is 0.548853049023

Actual Results

Should be 0.487888392921
Is 15.3260054113

Should be 0.548853049023
Is 0.501109879279

Versions

Windows-10-10.0.10586-SP0
Python 3.5.2 |Anaconda custom (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
NumPy 1.11.2
SciPy 0.18.1
Scikit-Learn 0.18.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugModerateAnything that requires some knowledge of conventions and best practices

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0