[MRG] Fix euclidean_distances numerical instabilities #13410

jeremiedbb · 2019-03-07T17:20:29Z

Fixes #9354
Should close #11271 eventually
should close #12128

This PR follows the discussion in #9354, i.e.

for n_features <= 16, use the usual exact method to compute distances d(x,y)² = ||x-y||²
for n_features > 16, keep the fast method d(x,y)² = ||x||² -2x.y + ||y||² but up-casting to float64 if necessary.

A couple of remarks:

When n_features <= 32, when one of the matrices is sparse and the other is not, the performances are pretty good, but I couldn't get good perfs when both are sparse. In that case I densified the smaller array. I think this is fine and won't cause a big increase of memory usage. The reason is as follows:
Worst cast, both arrays have same n_sample N. Then the ration between memory usage of the densified array and the distance matrix is N * n_features / N*N = n_features / N < 32 / N. So for reasonable N >~ 1000, the increase is at most ~3%.
To achieve good performances on the exact dense dense case I had to add the -Ofast compiler flag to enable better compiler optimizations (almost x2 speedup).

Benchmarks:

low dimension
n_samples_X = 100000, n_samples_Y = 100, n_features = 10

		cdist	master	PR
	dense/dense	102 ms ± 184 µs	74.4 ms ± 303 µs	99.5 ms ± 173 µs
float32	sparse/dense	/	96.3 ms ± 809 µs	101 ms ± 217 µs
	sparse/sparse	/	110 ms ± 5.44 ms	102 ms ± 804 µs
	dense/dense	98.4 ms ± 518 µs	122 ms ± 256 µs	134 ms ± 1.2 ms
float64	sparse/dense	/	165 ms ± 673 µs	103 ms ± 204 µs
	sparse/sparse	/	170 ms ± 11.6 ms	103 ms ± 92.4 µs

high dimension
n_samples_X = 100000, n_samples_Y = 100, n_features = 1000

		cdist	master	PR
	dense/dense	8.22 s ± 19.4 ms	406 ms ± 6.48 ms	1.11 s ± 2.86 ms
float32	sparse/dense	/	433 ms ± 902 µs	860 ms ± 26.8 ms
	sparse/sparse	/	879 ms ± 2.78 ms	1.02 s ± 970 µs
	dense/dense	8.08 s ± 230 ms	631 ms ± 11.3 ms	1.01 s ± 12.3 ms
float64	sparse/dense	/	701 ms ± 2.3 ms	758 ms ± 12.4 ms
	sparse/sparse	/	932 ms ± 14.5 ms	995 ms ± 4.59 ms

jnothman · 2019-03-11T06:17:09Z

Thanks for this!! I think our main priority here is numerical stability, while ideally maintaining good performance where we can. Let us know when you want review.

jeremiedbb · 2019-03-11T15:21:14Z

sklearn/metrics/tests/test_pairwise.py

+    distances = euclidean_distances(X, Y)
+
+    assert_allclose(distances, expected, rtol=1e-6)
+    assert distances.dtype == dtype


Test don't pass with rtol=1e-7 (default) in float32, probably because scipy upcast to float64 to compute distances which can cause differences in the last significant digit.

I think a rtol of 1e-6 is enough for float32. What do you think ?

jeremiedbb · 2019-03-11T15:22:50Z

@jnothman @amueller @rth I think it's ready for review.

jnothman · 2019-03-12T01:26:33Z

The ci/circleci: doc-min-dependencies may be real? Could you have generated infinite distances, for instance?

jnothman

Nice work! Please add a what's new entry.

sklearn/metrics/_safe_euclidean_sparse.pyx

sklearn/metrics/pairwise.py

jnothman · 2019-03-12T01:42:51Z

sklearn/metrics/pairwise_fast.pyx

+    return np.asarray(D)
+
+
+cdef floating _euclidean_dense_dense_exact_1d(floating *x,


Is this much better than using vectorized implementation?

pairwise distances can't be simply vectorized.
The numpy way would be to do:

D = ((X[:,:,np.newaxis] - Y.T[np.newaxis,:,:])**2).sum(axis=1)

but it takes a lot more memory and is 10x slower.

Another possibility is to do:

for i in range(n_samples_X): for j in range(n_samples_Y): D[i, j] = ((X[i] - Y[j])**2).sum()

But calling the numpy api in a tight C loop is catastrophic. It's 1000x slower.

So I far as I know the implementation I did is the only way to keep close to cdist performance. If you know a way to use vectorized implem I'll take it

sklearn/metrics/pairwise_fast.pyx

sklearn/metrics/pairwise.py

jnothman · 2019-03-12T01:57:00Z

sklearn/metrics/pairwise.py

+        xe = xs + (chunk_size if i < n_chunks_X - 1 else n_samples_X_rem)
+
+        X_chunk = X[xs:xe].astype(np.float64)
+        XX_chunk = row_norms(X_chunk, squared=True)


you won't use the XX and YY that were passed in?

In that special case, I won't indeed. The reason is that I only take that pass when n_features > 32 and dtype=float32 (for float64 no need to upcast so no need to chunk). In that case I can't use norms computed on float32 data. This is the main reason of the loss of precision. So I need to first upcast X and then compute the norms.

Am I correct to think that you ask for XY and YY to be passed on, but don't use them?

Yes I fixed that. I do use them now if they are in float64.

sklearn/metrics/pairwise.py

jeremiedbb · 2019-03-12T14:05:06Z

The ci/circleci: doc-min-dependencies may be real? Could you have generated infinite distances, for instance?

That's strange. the failure appears randomly. It passes for some commits and fails for some others. I can't reproduce locally. I'll investigate further.

jeremiedbb · 2019-03-12T14:26:20Z

sklearn/metrics/pairwise.py


    Returns
    -------
-    distances : {array, sparse matrix}, shape (n_samples_1, n_samples_2)
+    distances : array, shape (n_samples_1, n_samples_2)
+


I kept that change because it was wrong. The output is never sparse

jeremiedbb · 2019-03-14T11:12:32Z

I agree that trying to fix the problem, rather than optimize all cases, should be our first goal.

My goal is not to optimize more than it was. It's just that in both cases, i.e. upcast or exact, the performance if lower than before. I'm just trying to keep it reasonable.
Otherwise we can just do:

def euclidean_distances(X,Y):
    return np.sqrt(np.sum((X[:,:,np.newaxis], Y.T[np.newaxis,:,:])**2, axis=1))

:D

I removed some special cases. I want to keep the 2 cases, exact and upcast, to cover as many numerical issues situations as possible. I think it's simple enough now.

Celelibi · 2019-03-14T18:40:42Z

using the strides gives lower performances.
it's in the n_features < 16 case. Meaning that for a copy to cause a 100Mo increase of memory (which is not a lot), X has to have at least 780.000 rows.

It's true that since the number of column is a constant in this case, the copy would take an O(n) amount of memory while the result would still require O(n²).
But if the case of Fortran ordering is really common enough that we need to optimize for it, I would first explore the possibility of having a column-wise algorithm. Something that would do dist[i, j] += (x[i, k] - y[j, k])² for all k, j, i with k changing the slowest and i the fastest. It wouldn't be as good as having a C order but it might fare well compared to copying the array + usual algorithm.

My goal is not to optimize more than it was. It's just that in both cases, i.e. upcast or exact, the performance if lower than before. I'm just trying to keep it reasonable.

Yes but some optimizations in this PR are not related to the code improving the accuracy. The use of syrk in the case of symmetric . It's like trying to make a global optimization to hide the slow down incurred by the accuracy improvement.
It's not a catastrophy if there's a few intermediate revisions with a lower performance, as long as it's in the same ballpark.

How about we strip down this PR to the bare minimum that fixes the accuracy issue and leave the rest for a subsequent PR?

See, there are many variables that decide what to do: number of features, type of the inputs, whether the result is symmetric, the sparseness of X and Y, memory format of X and Y. That's a lot of cases, and not all of them might be common enough to be worth the effort to try to make an optimal algorithm for them. So we should probably think that through in another PR.

jeremiedbb · 2019-03-14T19:17:07Z

As explained, I removed the use of syrk, and the forcing to C ordered.
But the function still has to work on any order, on dense and sparse so as it is currently in master. So We have to keep this support.
There's only the upcast in high dim and the exact calc in low dim left. I think it's simple enough now and removing any of this would just mean keeping the numerical issues as they are.

Celelibi · 2019-03-14T23:10:25Z

I would be inclined to also postpone the check on the number of feature as well as the symmetric chunks. I mean, it's also just an optimization. The chunking code is already pretty complex and would probably need some refactoring with the pre-existing chunking code in pairwise_distances_chunked. And the chunked upcasting would still be a net improvement worth a merge IMO.

jeremiedbb · 2019-03-15T09:38:28Z

as well as the symmetric chunks. I mean, it's also just an optimization.

that's just 2 lines

if X is Y and j < i:
    d = distances[ys:ye, xs:xe].T

I really don't think it hurts the reviewing of this PR.

I would be inclined to also postpone the check on the number of feature [...] I mean, it's also just an optimization.

No it's also for better accuracy. Since we don't upcast the float64, using the exact method for low dimension is important. Especially since it's in low dimension that it's more likely to have accuracy issues.

kno10 · 2019-03-15T10:53:42Z

No it's also for better accuracy. Since we don't upcast the float64, using the exact method for low dimension is important. Especially since it's in low dimension that it's more likely to have accuracy issues.

Why do you claim it is more likely to have accuracy issues in low dimensionality?

The example I gave in #9354 (comment) exposes problems that get more severe with higher dimensionality. In these cases, upcasting to float64 may eventually not be sufficient anymore.

Celelibi · 2019-03-15T14:39:01Z

that's just 2 lines
I really don't think it hurts the reviewing of this PR.

As you wish. I would just make this PR as simple as possible so that we can get it right in a finite amount of time.

I would be inclined to also postpone the check on the number of feature [...] I mean, it's also just an optimization.

No it's also for better accuracy. Since we don't upcast the float64, using the exact method for low dimension is important. Especially since it's in low dimension that it's more likely to have accuracy issues.

I understand that. (Although it seems discussed by others.)
But the chunked upcasting is already a net improvement on the accuracy and is needed anyway. So I'd rather split the low dimension improvement into another commit.
But in the end, it's your PR, you decide. ^^

jnothman

This PR adds a lot of complexity, and I'm not convinced that all of it is justified, but I'm okay with it for now. I'm yet to check that the tests fail in master.

I think @kno10 would have us use the slow-but-sure computation in all cases because even in high dimensions the fast-but-rough calculation can give inaccurate distances (especially, if I'm not mistaken, when the norms of the operands are close).

@jeremiedbb are there aspects here you would consider changing? (@rth?)

jnothman · 2019-03-26T22:17:45Z

sklearn/metrics/pairwise.py

+    Notes
+    -----
+    When ``n_features > 16``, the euclidean distance between a pair of row
+    vector x and y is computed as::


vector -> vectors

jnothman · 2019-03-26T22:20:56Z

sklearn/metrics/pairwise.py

@@ -30,6 +30,8 @@
 from ..utils._joblib import effective_n_jobs

 from .pairwise_fast import _chi2_kernel_fast, _sparse_manhattan
+from .pairwise_fast import _euclidean_dense_dense_exact
+from ._safe_euclidean_sparse import _euclidean_sparse_dense_exact


It might be a bit surprising that these implementations are in different modules. Why did you want them separated?

I don't want them to be separated. I had to separate them because I wanted to compile pairwise_fast with the -ffast-math flag. The thing is that with this flag, gcc performs unsafe optimizations for the _safe_euclidean_sparse functions.

The reason I wanted to use this flag for pairwise_fast is that it's much faster, and should be safe because there's no operation that gcc would optimize in an unsafe way. By the way, scipy uses this flag for its euclidean distance.

jeremiedbb · 2019-03-27T08:49:33Z

I'm yet to check that the tests fail in master

This test fails on master:

@pytest.mark.parametrize("dtype, x",
                         [(np.float32, 10000), (np.float64, 100000000)],
                         ids=["float32", "float64"])
@pytest.mark.parametrize("dim", [1, 100])
def test_euclidean_distances_extreme_values(dtype, x, dim):
    # check that euclidean distances is correct where 'fast' method wouldn't be
    X = np.array([[x + 1] + [0] * (dim - 1)], dtype=dtype)
    Y = np.array([[x] + [0] * (dim - 1)], dtype=dtype)

    expected = [[1]]

    distances = euclidean_distances(X, Y)

    if dtype == np.float64 and dim == 100:
        # This is expected to fail for float64 and dimension > 32 due to lack
        # of precision of the fast method of euclidean distances.
        with pytest.raises(AssertionError, match='Not equal to tolerance'):
            assert_allclose(distances, expected)
    else:
        assert_allclose(distances, expected)

Doing the upcasting of float32 only fixes the float32 part (low dim and high dim).
Doing the exact computation on low dim fixes the float64 low dim part.
The float64 high dim is not fixed by this PR.

jeremiedbb · 2019-03-27T09:03:14Z

@jeremiedbb are there aspects here you would consider changing? (@rth?)

If I understand correctly, @Celelibi would only keep the upcasting of float32, and @kno10 would keep the exact computation and extend it to all cases.

If I wanted to keep only one, I think I'd go for the upcasting. The reason is that it's enough to fix all the float32 instabilities and I still think that the float64 instabilities should happen really rarely in a machine learning context.

If we decide to only keep the upcasting, wouldn't it make more sense to review @Celelibi's PR then ?

kno10 · 2019-03-27T09:17:36Z

If I wanted to keep only one, I think I'd go for the upcasting. The reason is that it's enough to fix all the float32 instabilities and I still think that the float64 instabilities should happen really rarely in a machine learning context.

But that 'should happen really rarely' is just speculation, isn't it?

I have shown that the issues can much more quickly arise with increasing dimensionality. Because of that, I am no longer sure that the upcasting is even enough to resolve this completely for float32. Upcasting solves this for a single variable, but with 1000 dimensions, we probably still lose around 3 digits of precision in the worst case, i.e., have 4 digits correct, whereas the "actually-not-that-slow-but-sure" approach should give all 7.
For low dimensionality (<8-16 dimensions), the "slow-but-sure" approach was faster in the benchmarks. For higher dimensionality, the numerical problems seem to become more severe...

jeremiedbb · 2019-03-27T09:29:06Z

But that 'should happen really rarely' is just speculation, isn't it?

No it's not. With 16 digits precision you'll keep more precision when squaring than if if had 8 digits precsion. This is illustrated by the example x = 10000, y = 10001 in float32 while you have to go up to 10000000, 10000001 in float64. And the former will happen a lot more often than the latter, especially in a machine learning context.

I have shown that the issues can much more quickly arise with increasing dimensionality. Because of that, I am no longer sure that the upcasting is even enough to resolve this completely for float32.

The loss of precision comes from the squaring of the coordinates. If you have 8 digits precision, then you need 16 digits precision to store the square. This is exactly what upcasting provides.

kno10 · 2019-03-27T10:23:26Z

No it's not. With 16 digits precision you'll keep more precision when squaring than if if had 8 digits precsion. This is illustrated by the example x = 10000, y = 10001 in float32 while you have to go up to 10000000, 10000001 in float64. And the former will happen a lot more often than the latter, especially in a machine learning context.

Define "happen a lot more". How often does the latter happen? It may matter to more people than you think. This is still just speculation. And this is just the toy example that was easy to find. There probably are worse cases.

I have shown that the issues can much more quickly arise with increasing dimensionality. Because of that, I am no longer sure that the upcasting is even enough to resolve this completely for float32.

The loss of precision comes from the squaring of the coordinates. If you have 8 digits precision, then you need 16 digits precision to store the square. This is exactly what upcasting provides.

No, that is not sufficient. Because you work with the sum-of-squares.
Again, I have repeatedly shown you such examples that show this effect in isolation. But here you go once more with an extreme case:

a = np.array([1.0] * 1000000, np.float32)
b = np.array([1.1] * 1000000, np.float32)
print(np.sqrt(sum((a-b)**2)))
> 100.000022165
print(np.sqrt(a.dot(a)+b.dot(b)-2*a.dot(b)))
> 118.337441243

1 digit of precision with fp32, because it cost us 6 digits due to the high dimensionality. Now one million variables of course is an extreme case. Consider word vectors have 1000 dimensions. Do the same experiment with [1.01] * 1000 and you'll have an error of 1.7% with fp32.

a = np.array([1.000] * 1000, np.float32)
b = np.array([1.001] * 1000, np.float32)
print(np.sqrt(sum((a-b)**2))/np.sqrt(.001))
> 1.00004672294
print(np.sqrt(a.dot(a)+b.dot(b)-2*a.dot(b))/np.sqrt(.001))
> RuntimeWarning: invalid value encountered in sqrt
> nan

oops. The "bad" equation produced a negative difference, and hence a nan. sklearn 0.19.1 and 0.20.3 with sklearn.metrics.pairwise.euclidean_distances both give the distance of 0 instead of 0.031624254110664017 (=sqrt(.001)), because of:

scikit-learn/sklearn/metrics/pairwise.py

Line 252 in 1241 5ff64d3

np.maximum(distances, 0, out=distances)

(it is reasonable to have this safeguard, but it can also hide such problems).

So assume someone is working with word vectors of length 1000, are you certain this is "really rarely" causing problems? I am not... 0 values that shouldn't be 0 can cause all kinds of trouble. judging from a quick look at Google Scholar, there are people that apply k-means to word2vec vectors.

jeremiedbb · 2019-03-27T10:52:19Z

Your examples are not complete. You only show the issue with float32 but you don't show what happens when you upcast.

a = np.array([[1.0] * 1000000], np.float32)
b = np.array([[1.1] * 1000000], np.float32)
print(np.sqrt(((a-b)**2).sum()))
> 99.99997
print(euclidean_distances(a,b))
> 99.43214
print(euclidean_distances(a.astype(np.float64), b.astype(np.float64)).astype(np.float32))
> 100.00002  # same precision as exact method

Maybe 1000000 is too small

a = np.array([[1.0] * 100000000], np.float32)
b = np.array([[1.1] * 100000000], np.float32)
print(np.sqrt(((a-b)**2).sum()))
> 1000.03143
print(euclidean_distances(a,b))
> 0.
print(euclidean_distances(a.astype(np.float64), b.astype(np.float64)).astype(np.float32))
> 1000.00024

Oops, better precision than exact method...

jnothman · 2019-03-27T11:01:37Z

FWIW, I get more accurate results for print(np.sqrt(((a-b)**2).sum())) with those examples, @jeremiedbb, respectively 100.0 and 1000.03.

jeremiedbb · 2019-03-27T11:04:28Z

1000.00024 is still more precise than 1000.03

kno10 · 2019-03-27T13:46:01Z

For fairness, you should then compare to upcasting the (a-b)**2 method.

print(np.sqrt(((a-b).astype(np.float64)**2).sum()).astype(np.float32))
> 1000.0

But the point is that such problems can arise in ways that you do not seem to have considered. As I am not an expert on numerics, I'd rather be safe when there already is a known problem there, and we don't know if our workaround captures all the cases.

jeremiedbb · 2019-03-27T13:49:49Z

For fairness, you should then compare to upcasting the (a-b)**2 method.

No, the point is not to ! We're comparing the exact method on float32 and the fast method with upcast.

jeremiedbb · 2019-03-27T14:12:38Z

But the point is that such problems can arise in ways that you do not seem to have considered.

I've considered all the cases you mention. But as I said multiple times, I'm not proposing a solution which fixes all numerical precision issues. And the example I gave shows that even with the exact method you can get precision issues. As long as we don't work at arbitrary precision, there will be precision issues (but numpy/sklearn are not mathematica). I'm just proposing a reasonable effort in that direction, which is what has been agreed on.

Celelibi · 2019-03-27T17:02:46Z

If we decide to only keep the upcasting, wouldn't it make more sense to review @Celelibi's PR then ?

It had many optimizations too. And I've seen a bug in my old PR because of a symmetry optimization.

@kno10

I have shown that the issues can much more quickly arise with increasing dimensionality.

I would just kindly notice that your example use python's sum instead of numpy's sum. The former uses whatever C double is available on the machine. Usually a 64 bits float when stored in memory and 80 bits when the values stay in registers. The latter perform a pairwise sum and uses the input type for the accumulator unless told otherwise.

Just for information, the place where we loose most of the accuracy with the exact formula is in the summation.

>>> import numpy as np
>>> def neumaier_sum(X):
...     res = np.float32(0.)
...     corr = np.float32(0.)
...     for e in X:
...         tmp = res + e
...         if np.abs(res) > np.abs(e):
...             corr += (res - tmp) + e
...         else:
...             corr += (e - tmp) + res
...         res = tmp
...     return res + corr
... 
>>> a = np.array([1.0] * 1000000, np.float32)
>>> b = np.array([1.1] * 1000000, np.float32)
>>> sum((a-b)**2)
10000.004433095455
>>> neumaier_sum((a-b)**2)
10000.204
>>> np.sum((a-b)**2)
9999.993
>>> np.sum((a-b)**2, dtype=np.float64)
10000.004433095455

@kno10, what solution are you suggesting? Upcast and always use the formula sum((x-y)²)?
Anyway, we'll stay in the dark until someone come up with a proven upper bound of the numerical error for the algorithms. Until then we can only guess and use random tests.

BTW, I confirm the statement of @jeremiedbb. On random data, the algorithm that upcast and use the expanded formula produce the exact result. Both for few features and for many features. So that's a net improvement of the accuracy with a small performance cost and little additional code complexity.

kno10 · 2019-03-28T12:18:59Z

If possible I would avoid the upcast (because it uses a lot of memory, and isn't free).
Numerically, sum((x-y)²) seems to be the safest, in particular when using double at least during the aggregation (which is possible with fp32 input, too, without having to "upcast" which is a misnomer because it is an expensive copy, not a cast).
For extra precision, for example Kahan summation could be used on the aggregator, but that will only be beneficial for a small subset of users I guess, so I would not make that the default.

jnothman · 2019-03-28T12:56:33Z

You're right that the upcast, rather than just aggregating in the higher precision, is due to limitations of numpy.dot. not sure if there's a simple way around it (einsum takes dtype but not sure it's the right solution)

kno10 · 2019-03-28T15:52:18Z

According to https://stackoverflow.com/a/47775357/1939754

a_min_b = a - b
numpy.sqrt(numpy.einsum('ij,ij->i', a_min_b, a_min_b))

can be pretty fast, and seems to be accurate:

a = np.array([[1.0] * 1000000], np.float32)
b = np.array([[1.1] * 1000000], np.float32)
d=a-b
np.sqrt(np.einsum('ij,ij->i', d, d, dtype=np.float64))
> array([ 100.00002384])

but I don't know if this can be easily extended to work on pairwise vectors. Plus, it will still allocate n*m vectors, so it does cause some memory overhead. A well implemented vectorized C version may be much faster (and as mentioned, the registers may have extra precision, in particular with FMA).
Somehow I cannot believe that a method involving computing and checking all norms, copying all data to FP64, etc. can really be much faster.

a = np.array([[1.0] * 1000000], np.float32)
b = np.array([[1.1] * 1000000], np.float32)
d = np.subtract(a, b, dtype=np.float64)
np.sqrt(np.sum(np.square(d, d)))
> 100.00002384185791
np.sqrt(np.sum(d, dtype=np.float32)) # d was squared in-place above!
> 99.999969
np.sqrt(np.sum(d).astype(np.float32)) # d was squared in-place above!
> 100.00002

this would allow reusing a buffer for d easily, and should also give good precision when the values of a and b differ in magnitude, because all the computation is in fp64. But as we can see, the sum also can benefit from double precision.

Maybe you can benchmark a Cythonized version of this:

def dense_euclidean(A,B):
    n_samples_A = A.shape[0]
    n_samples_B = B.shape[0]
    assert A.shape[1] == B.shape[1]
    D = np.empty((n_samples_A, n_samples_B), B.dtype)
    d = None
    for i in range(n_samples_A):
        for j in range(n_samples_B):
            d = np.subtract(A[i], B[j], d, dtype=np.float64)
            D[i,j] = np.sum(np.square(d, d))
            # OR: D[i,j] = np.einsum('ij,ij->i', d, d)
    return np.sqrt(D)

Celelibi · 2019-03-28T17:20:55Z

Numerically, sum((x-y)²) seems to be the safest, in particular when using double at least during the aggregation (which is possible with fp32 input, too, without having to "upcast" which is a misnomer because it is an expensive copy, not a cast).

My biggest concern with that solution is that we would loose the performance benefits of using the underlying BLAS. Especially with a large number of features and samples. Which is precisely where the performance matters the most.
I don't see how scikit-learn could replicate the performance of OpenBLAS in the near future. Not only OpenBLAS has hand-coded kernels for each processors, but it also select the best ones dynamically. Meaning that linux distributions like Debian can build a single binary package that will run even on the oldest x86 machines (which is a requirement) and still use the advanced instructions of the newer ones. Without a similar mechanism, one would have to recompile scikit-learn to get good performance.

Additionnally, the expanded formula perform approximately 1/3 less operations overall. Which would be mostly visible with a large number of features. But that's just a performance bound, in reality I doubt we could get anywhere close to this bound in a reasonable amount of time. Are we really ready to run up to 50 times slower for a little bit of accuracy?

Maybe an additional algorithm argument could be the generic solution. We could also have a default heuristic decision when the algorithm isn't specified. But I doubt we would do this in the short term. We need a quick fix for the float32 case. Let's keep in mind that this instability has the consequence of making other algorithms fail on float32.

jeremiedbb · 2019-04-01T14:44:35Z

This discussion is endless... It makes this PR unreviewable...

I close this PR, and opened #13554. It only implements the float32 chunk & upcast part.

Please don't continue this discussion on #13554, but on the initial issue instead: #9354

jeremiedbb added 8 commits March 7, 2019 17:45

vect vect euclidean

005a311

safe euclidean distances small n_features

a7e8e5e

fix arry order

419d90e

fix array order

ecf8c3c

tst debug windows

cd5eec2

tst debug windows

81ef489

tst debug windows

25d90f1

clean

0d10f03

implement chunking upcasting

d6849f7

jeremiedbb commented Mar 11, 2019

View reviewed changes

jeremiedbb changed the title ~~[WIP] Fix euclidean_distances numerical unstabilities~~ [MRG] Fix euclidean_distances numerical unstabilities Mar 11, 2019

jeremiedbb added 4 commits March 11, 2019 16:45

lint

f376632

comment on norms

8cf87df

Merge branch 'master' into euclidean-dist

9adcc24

remove unnecessary condition

bb750fc

jnothman changed the title ~~[MRG] Fix euclidean_distances numerical unstabilities~~ [MRG] Fix euclidean_distances numerical instabilities Mar 12, 2019

jnothman reviewed Mar 12, 2019

View reviewed changes

Celelibi reviewed Mar 12, 2019

View reviewed changes

sklearn/metrics/pairwise.py Show resolved Hide resolved

Celelibi reviewed Mar 12, 2019

View reviewed changes

sklearn/metrics/pairwise.py Outdated Show resolved Hide resolved

Celelibi reviewed Mar 12, 2019

View reviewed changes

sklearn/metrics/pairwise.py Outdated Show resolved Hide resolved

Celelibi reviewed Mar 12, 2019

View reviewed changes

sklearn/metrics/pairwise.py Show resolved Hide resolved

jeremiedbb added 4 commits March 12, 2019 15:06

consistent names

9f02593

move to Notes in docstrings and typos

9bd0f4d

revert unrelated changes in docstring

2c2098a

document ignored norms

cecdb5f

jeremiedbb commented Mar 12, 2019

View reviewed changes

clean

f4fca49

jeremiedbb added 2 commits March 14, 2019 12:43

fix docstring

5fe7644

typo

6694ee1

jnothman reviewed Mar 26, 2019

View reviewed changes

jeremiedbb mentioned this pull request Apr 1, 2019

FIX euclidean_distances float32 numerical instabilities #13554

Merged

jeremiedbb closed this Apr 1, 2019

		return np.asarray(D)


		cdef floating _euclidean_dense_dense_exact_1d(floating *x,

Uh oh!

[MRG] Fix euclidean_distances numerical instabilities #13410

[MRG] Fix euclidean_distances numerical instabilities #13410

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!