Description
Description
Degeneracies not removed from a float32
matrix because it's hard coded to use float64. This causes incorrect clusters for special cases.
Here is the root cause of the issue:
S += ((np.finfo(np.double).eps * S + np.finfo(np.double).tiny * 100) *
random_state.randn(n_samples, n_samples))
Affinity Propagation uses as_float_array()
which allows both float32
and float64
but then hard codes float64
through out the rest.
The ideal solution is to declare the dtype of the input matrix and use that through out the code. Additionally, the other variables (A, R, tmp, e) should be declared using the same dtype as the user input.
Steps/Code to Reproduce
Here is a very simple example where you can intuitively see there should be 3 clusters.
import sklearn.cluster
import numpy as np
k = np.array([[1,0,0,0],
[0,1,1,0],
[0,1,1,0],
[0,0,0,1]], dtype='float32')
afp = sklearn.cluster.AffinityPropagation(preference=1, affinity='precomputed').fit(k)
print(afp.labels_)
Expected Results
array([0, 1, 1, 2], dtype=int64)
If k is float64
, it gives the correct results.
Actual Results
array([0, 0, 0, 1], dtype=int64)
Versions
Windows-10-10.0.16299-SP0
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)]
NumPy 1.14.0
SciPy 1.0.0
Scikit-Learn 0.19.1