10000 Incorrect Clusters Due To Dtype Mismatch · Issue #10832 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Incorrect Clusters Due To Dtype Mismatch #10832
Closed
@Wikilicious

Description

@Wikilicious

Description

Degeneracies not removed from a float32 matrix because it's hard coded to use float64. This causes incorrect clusters for special cases.

Here is the root cause of the issue:

 S += ((np.finfo(np.double).eps * S + np.finfo(np.double).tiny * 100) *
          random_state.randn(n_samples, n_samples))

Affinity Propagation uses as_float_array() which allows both float32 and float64 but then hard codes float64 through out the rest.

The ideal solution is to declare the dtype of the input matrix and use that through out the code. Additionally, the other variables (A, R, tmp, e) should be declared using the same dtype as the user input.

Steps/Code to Reproduce

Here is a very simple example where you can intuitively see there should be 3 clusters.

import sklearn.cluster
import numpy as np

k = np.array([[1,0,0,0],
              [0,1,1,0],
              [0,1,1,0],
              [0,0,0,1]], dtype='float32')

afp = sklearn.cluster.AffinityPropagation(preference=1, affinity='precomputed').fit(k)
print(afp.labels_)

Expected Results

array([0, 1, 1, 2], dtype=int64)

If k is float64, it gives the correct results.

Actual Results

array([0, 0, 0, 1], dtype=int64)

Versions

Windows-10-10.0.16299-SP0
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)]
NumPy 1.14.0
SciPy 1.0.0
Scikit-Learn 0.19.1

< 51A8 div height="sm" width="150px" class="Box-sc-g0xbh4-0 LoadingSkeleton-sc-695d630a-0 jKxfJf eyUUZI">

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0