Description
Description
I am using euclidean distances in a project and after updating, the result is wrong for just one of several datasets. When comparing it to scipy.spatial.distance.cdist one can see that in version 21.1 it behaves substantially different to 20.3.
The matrix is an ndarray with size (100,10000) with float32.
Steps/Code to Reproduce
from sklearn.metrics.pairwise import euclidean_distances
import sklearn
from scipy.spatial.distance import cdist
import matplotlib.pyplot as plt
import numpy as np
X = np.load('wont.npy')
ed = euclidean_distances(X)
ed_ = cdist(X, X, metric='euclidean')
plt.plot(np.sort(ed.flatten()), label='euclidean_distances sklearn {}'.format(sklearn.__version__))
plt.plot(np.sort(ed_.flatten()), label='cdist')
plt.yscale('symlog', linthreshy=1E3)
plt.legend()
plt.show()
The data are in this zip
wont.zip
Expected Results
Can be found when using sklearn 20.3, both behave identical.
sklearn20.pdf
Actual Results
When using version 21.1 has many 0 entries and some unreasonably high entries
sklearn_v21.pdf
Versions
Sklearn 21
System:
python: 3.6.7 (default, Oct 22 2018, 11:32:17) [GCC 8.2.0]
executable: /home/lenz/PycharmProjects/pyrolmm/venv_sklearn21/bin/python3
machine: Linux-4.15.0-50-generic-x86_64-with-Ubuntu-18.04-bionic
BLAS:
macros: HAVE_CBLAS=None, NO_ATLAS_INFO=-1
lib_dirs: /usr/lib/x86_64-linux-gnu
cblas_libs: cblas
Python deps:
pip: 9.0.1
setuptools: 39.0.1
sklearn: 0.21.1
numpy: 1.16.3
scipy: 1.3.0
Cython: None
pandas: None
For sklearn 20.3 the versions are:
System:
python: 3.6.7 (default, Oct 22 2018, 11:32:17) [GCC 8.2.0]
executable: /home/lenz/PycharmProjects/pyrolmm/venv_sklearn20/bin/python3
machine: Linux-4.15.0-50-generic-x86_64-with-Ubuntu-18.04-bionic
BLAS:
macros: HAVE_CBLAS=None, NO_ATLAS_INFO=-1
lib_dirs: /usr/lib/x86_64-linux-gnu
cblas_libs: cblas
Python deps:
pip: 9.0.1
setuptools: 39.0.1
sklearn: 0.20.3
numpy: 1.16.3
scipy: 1.3.0
Cython: None
pandas: None