8000 Incorrect Clusters Due To Dtype Mismatch · Issue #10832 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Incorrect Clusters Due To Dtype Mismatch #10832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Wikilicious opened this issue Mar 18, 2018 · 1 comment · Fixed by #17995
Closed

Incorrect Clusters Due To Dtype Mismatch #10832

Wikilicious opened this issue Mar 18, 2018 · 1 comment · Fixed by #17995

Comments

@Wikilicious
Copy link

Description

Degeneracies not removed from a float32 matrix because it's hard coded to use float64. This causes incorrect clusters for special cases.

Here is the root cause of the issue:

 S += ((np.finfo(np.double).eps * S + np.finfo(np.double).tiny * 100) *
          random_state.randn(n_samples, n_samples))

Affinity Propagation uses as_float_array() which allows both float32 and float64 but then hard codes float64 through out the rest.

The ideal solution is to declare the dtype of the input matrix and use that through out the code. Additionally, the other variables (A, R, tmp, e) should be declared using the same dtype as the user input.

Steps/Code to Reproduce

Here is a very simple example where you can intuitively see there should be 3 clusters.

import sklearn.cluster
import numpy as np

k = np.array([[1,0,0,0],
              [0,1,1,0],
              [0,1,1,0],
              [0,0,0,1]], dtype='float32')

afp = sklearn.cluster.AffinityPropagation(preference=1, affinity='precomputed').fit(k)
print(afp.labels_)

Expected Results

array([0, 1, 1, 2], dtype=int64)

If k is float64, it gives the correct results.

Actual Results

array([0, 0, 0, 1], dtype=int64)

Versions

Windows-10-10.0.16299-SP0
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)]
NumPy 1.14.0
SciPy 1.0.0
Scikit-Learn 0.19.1

@rth
Copy link
Member
rth commented Mar 19, 2018

Interesting, thanks for investigating this issue @Wikilicious !

I can reproduce on the master branch with Linux. This is related to the more general #5776 issue. A PR to fix it (with some unit tests) would be welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
0