8000 ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Dec 22, 2024) ⚠️ · Issue #30509 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Dec 22, 2024) ⚠️ #30509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
scikit-learn-bot opened this issue Dec 19, 2024 · 9 comments
Labels

Comments

@scikit-learn-bot
Copy link
Contributor
scikit-learn-bot commented Dec 19, 2024

CI is still failing on Linux_Nightly.pylatest_pip_scipy_dev (Dec 22, 2024)

  • test_euclidean_distances_extreme_values[1000000-float32-0.0001-1e-05]
@github-actions github-actions bot added the Needs Triage Issue requires triage label Dec 19, 2024
@lesteve lesteve added Bug and removed Needs Triage Issue requires triage labels Dec 19, 2024
@lesteve
Copy link
Member
lesteve commented Dec 19, 2024

I can reproduce the issue with numpy-dev (no issue with numpy 2.2 latest release)

pytest sklearn/metrics/tests/test_pairwise.py -k extreme_values
E       AssertionError: 
E       Not equal to tolerance rtol=1e-05, atol=0
E       
E       Mismatched elements: 1 / 1 (100%)
E       Max absolute difference among violations: 2.74553895e-05
E       Max relative difference among violations: 0.00027451
E        ACTUAL: array([[0.100044]], dtype=float32)
E        DESIRED: array([[0.100017]])

A git bisect seems to indicate that numpy/numpy#27883 is the culprit ... I commented in numpy/numpy#27883 (comment).

We are comparing scipy.cdist against our implementation of pairwise_distances:

@pytest.mark.parametrize(
"dtype, eps, rtol",
[
(np.float32, 1e-4, 1e-5),
pytest.param(
np.float64,
1e-8,
0.99,
marks=pytest.mark.xfail(reason="failing due to lack of precision"),
),
],
)
@pytest.mark.parametrize("dim", [1, 1000000])
def test_euclidean_distances_extreme_values(dtype, eps, rtol, dim):
# check that euclidean distances is correct with float32 input thanks to
# upcasting. On float64 there are still precision issues.
X = np.array([[1.0] * dim], dtype=dtype)
Y = np.array([[1.0 + eps] * dim], dtype=dtype)
distances = euclidean_distances(X, Y)
expected = cdist(X, Y)
assert_allclose(distances, expected, rtol=1e-5)

This only fails with dtype=float32 and dim=1000000. The comparison used to pass with 1e-5 and now the relative diff is 2.7e-4 so something looks kind of fishy.

scipy.cdist does not change but our implemention does somehow with numpy/numpy#27883.

  • when failure:
distances=array([[0.10004405]], dtype=float32)
expected=array([[0.10001659]])
  • when success
distances=array([[0.10001709]], dtype=float32)
expected=array([[0.10001659]])

@seberg
Copy link
Contributor
seberg commented Dec 19, 2024

Moving here...

Actually, complete guess, but maybe in row_norms which does the equivalent of np.sqrt((X * X).sum(axis=1)) with a np.einsum?

def row_norms(X, squared=False):
"""Row-wise (squared) Euclidean norm of X.
Equivalent to np.sqrt((X * X).sum(axis=1)), but also supports sparse
matrices and does not create an X.shape-sized temporary.
Performs no input validation.
Parameters
----------
X : array-like
The input array.
squared : bool, default=False
If True, return squared norms.
Returns
-------
array-like
The row-wise (squared) Euclidean norm of X.
"""
if sparse.issparse(X):
X = X.tocsr()
norms = csr_row_norms(X)
if not squared:
norms = np.sqrt(norms)
else:
xp, _ = get_namespace(X)
if _is_numpy_namespace(xp):
X = np.asarray(X)
norms = np.einsum("ij,ij->i", X, X)
norms = xp.asarray(norms)
else:
norms = xp.sum(xp.multiply(X, X), axis=1)
if not squared:
norms = xp.sqrt(norms)
return norms

So, I am a bit surprised by two things:

  • As far as I can tell, at least after calculating the differences, we should be using float64, and I am not sure if float64 should lead to this big a difference.
  • I expected einsum to always use naive summation and in that case there should be no difference between the branches. So if this is einsum and the difference is coming from there that may really need a closer look.

@lesteve
Copy link
Member
lesteve commented Dec 19, 2024

Thanks for your feed-back! In case there is no further insights from @jjerphan or @jeremiedbb, I'll try to debug numpy vs numpy-dev to track down were the difference comes from so that it would be less about guessing.

@jjerphan
Copy link
Member

I do not have time to look at it, unfortunately.

@seberg
Copy link
Contributor
seberg commented Dec 19, 2024

May try to step through the code here to see the likely change. I am still very confident this is just a tiny precision change (very likely to the better even) due to different chunk sizes getting summed up.
It might be nice to just find out where this is happening for curiosity.

@seberg
Copy link
Contributor
seberg commented Dec 19, 2024

OK, it is indeed the einsum, and unlike normal summation which has a better precision, this indeed has a worse one.

The reason, is that the einsum code works the following way:

def inner(in1, in2, out):
    accum = 0
    for i1, i2 in zip(in1, in2):
        accum += i1 * i2

    out[0] += accum

Now, before the change, that for loop would go in chunks of 8196 and then add the result, which means that accum is bounded by ~8196. But after the change, the for loop will go to the full 100_000.
That is good for np.sum because it uses pairwise summation, but worse here, because any chunking gives a bit of the pairwise summation advantage.

While slightly worse for np.einsum("i,i->i") we could just force that chunking in einsum for the "precision benefit".

@scikit-learn-bot scikit-learn-bot changed the title ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Dec 19, 2024) ⚠️ ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Dec 20, 2024) ⚠️ Dec 20, 2024
@scikit-learn-bot scikit-learn-bot changed the title ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Dec 20, 2024) ⚠️ ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Dec 21, 2024) ⚠️ Dec 21, 2024
@seberg
Copy link
Contributor
seberg commented Dec 21, 2024

Still not sure if the precision ask here is just a bit high (due to catastrophic cancellation that the docs warn about) considering that this already uses 64bit floats internally, but the behavior is "reverted" in NumPy now.
(In fact, if it was using 32bit floats internally, the NumPy change would have improved things a lot, because einsum uses a 64bit temporary accumulation.)

The precision here also potentially depends on the memory layout if X and Y are 2-dimensional...

@scikit-learn-bot scikit-learn-bot changed the title ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Dec 21, 2024) ⚠️ ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Dec 22, 2024) ⚠️ Dec 22, 2024
@scikit-learn-bot
Copy link
Contributor Author

CI is no longer failing! ✅

Successful run on Dec 23, 2024

@ogrisel
Copy link
Member
ogrisel commented Dec 23, 2024

Thanks @seberg. Since the CI is green again after the merge of numpy/numpy#28043, I think we can close.

Still not sure if the precision ask here is just a bit high (...)
The precision here also potentially depends on the memory layout if X and Y are 2-dimensional...

Let's keep that in mind in case future changes cause this test to fail again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants
0