10000 Array API tests fail on main · Issue #29396 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Array API tests fail on main #29396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
StefanieSenger opened this issue Jul 3, 2024 · 22 comments · Fixed by #29436
Closed

Array API tests fail on main #29396

StefanieSenger opened this issue Jul 3, 2024 · 22 comments · Fixed by #29436

Comments

@StefanieSenger
Copy link
Contributor
StefanieSenger commented Jul 3, 2024

Describe the bug

I ran the Array API tests on main and got 10 failing tests.
(Last week, with an older main and everything else the same, I had 4 failing tests.)

array_api_compat==1.7.1

I only ran the cpu tests.

Steps/Code to Reproduce

pytest sklearn/utils/tests/test_array_api.py

Expected Results

all tests pass

Actual Results

FAILED sklearn/utils/tests/test_array_api.py::test_get_namespace_ndarray_with_dispatch - AssertionError
FAILED sklearn/utils/tests/test_array_api.py::test_average[None-None-False-21-numpy-None-None] - AssertionError
FAILED sklearn/utils/tests/test_array_api.py::test_average_raises_with_invalid_parameters[0-weights1-TypeError-1D weights expected-numpy-None-None] - ValueError: Shape of weights must be consistent with shape of a along specified axis.
FAILED sklearn/utils/tests/test_array_api.py::test_average_raises_with_invalid_parameters[0-weights2-ValueError-Length of weights-numpy-None-None] - AssertionError: Regex pattern did not match.
FAILED sklearn/utils/tests/test_array_api.py::test_count_nonzero[None-None-csr_matrix-numpy-None-None] - AssertionError
FAILED sklearn/utils/tests/test_array_api.py::test_count_nonzero[None-None-csr_array-numpy-None-None] - AssertionError
FAILED sklearn/utils/tests/test_array_api.py::test_count_nonzero[int-None-csr_matrix-numpy-None-None] - AssertionError
FAILED sklearn/utils/tests/test_array_api.py::test_count_nonzero[int-None-csr_array-numpy-None-None] - AssertionError
FAILED sklearn/utils/tests/test_array_api.py::test_count_nonzero[float-None-csr_matrix-numpy-None-None] - AssertionError
FAILED sklearn/utils/tests/test_array_api.py::test_count_nonzero[float-None-csr_array-numpy-None-None] - AssertionError

Versions

System:
    python: 3.12.2 (main, Apr 18 2024, 11:14:27) [GCC 13.2.1 20230801]
executable: /home/stefanie/.pyenv/versions/3.12.2/envs/scikit-learn_dev/bin/python
   machine: Linux-6.9.5-arch1-1-x86_64-with-glibc2.39

Python dependencies:
      sklearn: 1.6.dev0
          pip: 24.0
   setuptools: 69.5.1
        numpy: 2.1.0.dev0
        scipy: 1.13.0
       Cython: 3.0.10
       pandas: 2.2.2
   matplotlib: 3.8.4
       joblib: 1.4.0
threadpoolctl: 3.4.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 14
         prefix: libopenblas
       filepath: /home/stefanie/.pyenv/versions/3.12.2/envs/scikit-learn_dev/lib/python3.12/site-packages/scipy.libs/libopenblasp-r0-24bff013.3.26.dev.so
        version: 0.3.26.dev
threading_layer: pthreads
   architecture: Haswell

       user_api: openmp
   internal_api: openmp
    num_threads: 14
         prefix: libgomp
       filepath: /usr/lib/libgomp.so.1.0.0
        version: None

Also just tested with numpy==2.0.0, and the same failures.

@StefanieSenger StefanieSenger added Bug Needs Triage Issue requires triage labels Jul 3, 2024
@lesteve
Copy link
Member
lesteve commented Jul 3, 2024

I do have the same issues locally, it seems like the problem disappear when using numpy<2 (tried with numpy 1.26.4) for some reason ...

Maybe @betatim or @OmarManzoor have some suggestions about this and whether this is expected?

@lesteve lesteve added Array API and removed Needs Triage Issue requires triage labels Jul 3, 2024
@OmarManzoor
Copy link
Contributor

@lesteve Might be because of some updates in the new version of numpy. Will have to check this.

@lesteve
Copy link
Member
lesteve commented Jul 3, 2024

For completeness, one of the potential reason this was not noticed before is that lock-file in the CI for the array API tests do use 1.26.4 (I think because we require pytorch=1.13 that wants numpy<2):

❯ rg numpy build_tools/azure/pylatest_conda_forge_mkl_linux-64_conda.lock 
209:https://conda.anaconda.org/conda-forge/linux-64/numpy-1.26.4-py311h64a7726_0.conda#a502d7aad449a1206efb366d6a12c52d

@betatim
Copy link
Member
betatim commented Jul 4, 2024

The tests Stefanie ran don't require a GPU, so I thought that we run them in the normal CI as well.

I'll investigate the failures. If someone else has time/interest to look at whether the tests run as part of the normal CI or not that would be nice. Otherwise I'll also look into that.

In general I feel like we need to up our game around the array API tests. My (perceived) impression is that they constantly surprise us with "Ha, you thought the tests pass ... surprise! Tests don't actually pass!!" - which is a bit of time suck because you have these unscheduled "ok, lets fix up the tests once again" sessions :-/

@betatim
Copy link
Member
betatim commented Jul 4, 2024

xref numpy/numpy#26850 which I found while working on this

@betatim
Copy link
Member
betatim commented Jul 5, 2024

Waiting for reactions on the above issue before trying to fix things. We could remove the check that the thing returned from xp.sum is on the same device as the input. But it isn't clear to me if what we see is a bug in Numpy or us being too strict in our check.

@ogrisel
Copy link
Member
ogrisel commented Jul 8, 2024

In general I feel like we need to up our game around the array API tests. My (perceived) impression is that they constantly surprise us with "Ha, you thought the tests pass ... surprise! Tests don't actually pass!!" - which is a bit of time suck because you have these unscheduled "ok, lets fix up the tests once again" sessions :-/

I think part of this feeling is addressed by the CUDA CI. The other would be addressed by configuring a new PR job to run the tests on a GitHub Actions M1 worker but this requires some efforts.

The fact that pytorch is holding back on a numpy<1 is temporary. I don't think we should blame our CI config for this particular inability of our CI to tests with the latest numpy. The only alternative would be to have a second array API enabled CI config with only array-api-strict, array-api-compat as extra dependencies and nothing else. I am not sure if it's worth the extra CI config complexity though.

@lesteve
8000 Copy link
Member
lesteve commented Jul 8, 2024

The only alternative would be to have a second array API enabled CI config with only array-api-strict, array-api-compat as extra dependencies and nothing else. I am not sure if it's worth the extra CI config complexity though.

Tim's opinion was that it probably makes sense to have it working for the latest versions IIUC: #29387 (comment)

@ogrisel
Copy link
Member
ogrisel commented Jul 8, 2024

I agree we should not bother supporting older versions of array-api-strict. Still the fact that we did not detect the problem with numpy 2 is because our CI configurations that have a dependency on array-api-strict and array-api-compat also have a dependency on PyTorch which introduces the constraint on numpy<2 which hid the problem with numpy 2.

But I agree we should fix the "TypeError: array iteration is not allowed in array-api-strict" problem asap seen on discovered in #29373.

Back to the numpy 2 problem, the upstream fix is being developed at:

but it's likely that it won't be shipped before numpy 2.1. Is it fine to wait or do we want to add some compat layer for numpy 2 in our internal numpy wrapper?

EDIT: or maybe we should just fix/skip a few tests for numpy 2.0.

@betatim
Copy link
Member
betatim commented Jul 8, 2024

It was more a general feeling that it seems with array API related things we have the situation "we thought this worked but it turns out it doesn't" more often than on the "not array API" side. Which is what provides the motivation for working on exciting things like the CUDA CI ;)

@betatim
Copy link
Member
betatim commented Jul 8, 2024

Still the fact that we did not detect the problem with numpy 2 is because our CI configurations that have a dependency on array-api-strict and array-api-compat also have a dependency on PyTorch which introduces the constraint on numpy<2 which hid the problem with numpy 2.

What about installing array-api-strict and array-api-compat in (some of) our normal CPU based CI?

@ogrisel
Copy link
Member
ogrisel commented Jul 8, 2024

What about installing array-api-strict and array-api-compat in (some of) our normal CPU based CI?

Let's do that then. Let's fine a config that is already running numpy 2.

@lesteve
Copy link
Member
lesteve commented Jul 8, 2024

Was there was a good reason to have pytorch=1.13? If not, then I would try first to remove the PyTorch pin ...

Running locally pytest sklearn/utils/tests/test_array_api.py with pytorch-cpu 2.3 and numpy 2.0, it does not seem to create more failures than in the top message.

@ogrisel
Copy link
Member
ogrisel commented Jul 8, 2024

It's only pinned in pylatest_conda_forge_mkl_linux-64 and not in pylatest_conda_forge_cuda_array-api_linux-64 so it's probably not needed anymore.

@betatim
Copy link
Member
betatim commented Jul 8, 2024

Having said that (array-api-compat in CPU CI), scikit-learn/build_tools/azure/pylatest_conda_forge_mkl_linux-64_environment.yml already contains array-api-compat. But we install it from conda-forge, so maybe what we really need is to switch to using the PyPI package/be patient for conda-forge to catch up.

@ogrisel
Copy link
Member
ogrisel commented Jul 8, 2024

pylatest_conda_forge_mkl_linux also has pytorch and therefore cannot test with numpy>=2. We need to configure an extra entry with array-api-compat and array-api-strict but no pytorch so that we can test the array-api-strict + numpy>=2 combo.

@ogrisel
Copy link
Member
ogrisel commented Jul 8, 2024

I started a PR to see if it's easy to fix numpy 2 support. I am testing locally. Feel free to open another PR for the lock files concurrently, otherwise I will do it in #29436 but probably not before tomorrow.

@lesteve
Copy link
Member
lesteve commented Jul 8, 2024

About lock-files, I think we should merge #29388 first.

The lock-files were not updated for a while for a variety of reasons (array-api-strict 2 update breaking things being one of them) and it would help having them up-to-date. For example, I don't follow the thing about array-api-compat + conda-forge vs PyPI but I am pretty sure this is a confusion coming from lock-files not being updated for a while.

@ogrisel
Copy link
Member
ogrisel commented Jul 9, 2024

About lock-files, I think we should merge #29388 first.

Merged. I will update my PR with main.

@ogrisel
Copy link
Member
ogrisel commented Jul 9, 2024

I will add array-api-strict without pytorch to one of the CPU builds on the CI as part of #29436.

@ogrisel
Copy link
Member
ogrisel commented Jul 9, 2024

The CI now runs the tests successfully with array-api-strict and numpy 2.0.

@StefanieSenger please note that according to the output of sklearn.show_version() you are running a dev version of numpy. While I believe this should still work most of the time, it's not guaranteed and I would advise to use a stable version of numpy in case of new test failures that are not reproduced on the regular CI on your PRs.

@StefanieSenger
Copy link
Contributor Author

@StefanieSenger please note that according to the output of sklearn.show_version() you are running a dev version of numpy. While I believe this should still work most of the time, it's not guaranteed and I would advise to use a stable version of numpy in case of new test failures that are not reproduced on the regular CI on your PRs.

Yes, thank you. I have noted this after submitting the issue and this is why I have edited the issue afterwards and added that with numpy 2.0 I have the same failures. So, I had noticed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants
0