8000 Make the array API check strict for unit tests by betatim · Pull Request #29571 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Make the array API check strict for unit tests #29571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

betatim
Copy link
Member
@betatim betatim commented Jul 26, 2024

This means tests will not run and raise an error if one of the (possibly) optional dependencies is not installed or configured properly.

The idea is to keep the check flexible when it is used to check if we can run user code, but make it strict when running unit tests. For user code we don't know if we need the array API features of SciPy, so setting the environment variable is recommended but not required. For our unit tests we know we will test code that needs it and the warning is easily lost (by default hidden) and the error message you see regarding the failed test is not helpful for tracking down what you need to do to fix it.

Follow up to #29549

This means tests will not run and raise an error if one of the
(possibly) optional dependencies is not installed or configured
properly.
Copy link
github-actions bot commented Jul 26, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 7eaf420. Link to the linter CI: here

10000
@@ -295,7 +295,7 @@ def set_random_state(estimator, random_state=0):


try:
_check_array_api_dispatch(True)
_check_array_api_dispatch(True, strict=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we have to be careful about not importing sklearn.utils._testing in any non-testing code. (I know we have done this in the past)

A safer route is to call _check_array_api_dispatch(True, strict=True) somewhere in conftest.py.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was already here, so should we move it or just make a note of this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On main, it was okay because _check_array_api_dispatch only raised warnings.

I was thinking of leaving this part alone and add an additional _check_array_api_dispatch(True, misconfigured_scipy="raise") in conftest.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like this better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pondered this for a bit. I couldn't work out a good place to add it to in conftest.py as I couldn't find a good place where you know that "this is a test that requires array API, so lets run _check_array_api_dispatch()". Then I realised that this is also true for the current solution in _testing.py. It will raise an exception even if you never planned to use the array API.

So far we only skip tests when something is misconfigured, we don't raise an error/stop test execution.

Now I think the place to add _check_array_api_dispatch(True, misconfigured_scipy="raise") is in _array_api_for_tests, after checking for array_api_compat. Because I think this is a place where we know that "the user wants to run array API tests". And we will not reach this bit of code when the user doesn't want to run the array API tests.

The problem is, almost all array API related tests use _array_api_for_tests, but not all of them. It looks like the ones that don't use it use @skip_if_array_api_compat_not_configured but also explicitly test something lower level about the numpy wrapper/array API machinery. So probably fine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if the simplest thing to do is not have a conftest.py (not sklearn/conftest.py this is too late scipy has already been imported at this stage) that sets SCIPY_ARRAY_API=1 as mentioned at the end of #29549 (comment). This will likely not work when we run pytest --pyargs sklearn outside of the repo root (conftest.py will not be found), but this may be an acceptable solution during the scipy array API transition.

Alternative with the sklearn/conftest.py route which I kind of prefer too, in part because in my experience pytest is flexible enough that you can manage to do a reasonable thing in finite-time. I think something reasonable although a bit hacky would be the following:

In pytest_collection_modifyitems(config, items) you have all the test names ([item.name for item in items]) and if one of them contains array_api then you call _check_array_api_dispatch(True, misconfigured_scipy="raise").

Maybe a combination of both? Sorry I haven't had time to think this completely through through so I am mostly giving rough directions ...

if os.environ.get("SCIPY_ARRAY_API") != "1":
warnings.warn(

if not scipy._lib._array_api.SCIPY_ARRAY_API:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since is this private API I would do something link:

try:
   scipy_array_api_enabled = scipy._lib._array_api.SCIPY_ARRAY_API
except AttributeError:
   warnings.warn(
       "Could not inspect if scipy array API support is enabled or not, assuming it is."
   )
   scipy_array_api_enabled = True

if not scipy_array_api_enabled:
   message = ...

Otherwise, a future version of scipy where array API support is enabled by default and does not define this private attribute anymore might break scikit-learn.

@ogrisel
Copy link
Member
ogrisel commented Sep 13, 2024

We merged a more radical but simpler solution in #29814 so I propose to close this. If #29814 proves too annoying, we can reopen and adapt this PR instead.

@ogrisel ogrisel closed this Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0