Make the array API check strict for unit tests#29571
Make the array API check strict for unit tests#29571betatim wants to merge 4 commits intoscikit-learn:mainfrom
Conversation
This means tests will not run and raise an error if one of the (possibly) optional dependencies is not installed or configured properly.
sklearn/utils/_testing.py
Outdated
|
|
||
| try: | ||
| _check_array_api_dispatch(True) | ||
| _check_array_api_dispatch(True, strict=True) |
There was a problem hiding this comment.
This means we have to be careful about not importing sklearn.utils._testing in any non-testing code. (I know we have done this in the past)
A safer route is to call _check_array_api_dispatch(True, strict=True) somewhere in conftest.py.
There was a problem hiding this comment.
It was already here, so should we move it or just make a note of this?
There was a problem hiding this comment.
On main, it was okay because _check_array_api_dispatch only raised warnings.
I was thinking of leaving this part alone and add an additional _check_array_api_dispatch(True, misconfigured_scipy="raise") in conftest.py.
There was a problem hiding this comment.
I've pondered this for a bit. I couldn't work out a good place to add it to in conftest.py as I couldn't find a good place where you know that "this is a test that requires array API, so lets run _check_array_api_dispatch()". Then I realised that this is also true for the current solution in _testing.py. It will raise an exception even if you never planned to use the array API.
So far we only skip tests when something is misconfigured, we don't raise an error/stop test execution.
Now I think the place to add _check_array_api_dispatch(True, misconfigured_scipy="raise") is in _array_api_for_tests, after checking for array_api_compat. Because I think this is a place where we know that "the user wants to run array API tests". And we will not reach this bit of code when the user doesn't want to run the array API tests.
The problem is, almost all array API related tests use _array_api_for_tests, but not all of them. It looks like the ones that don't use it use @skip_if_array_api_compat_not_configured but also explicitly test something lower level about the numpy wrapper/array API machinery. So probably fine?
There was a problem hiding this comment.
I am wondering if the simplest thing to do is not have a conftest.py (not sklearn/conftest.py this is too late scipy has already been imported at this stage) that sets SCIPY_ARRAY_API=1 as mentioned at the end of #29549 (comment). This will likely not work when we run pytest --pyargs sklearn outside of the repo root (conftest.py will not be found), but this may be an acceptable solution during the scipy array API transition.
Alternative with the sklearn/conftest.py route which I kind of prefer too, in part because in my experience pytest is flexible enough that you can manage to do a reasonable thing in finite-time. I think something reasonable although a bit hacky would be the following:
In pytest_collection_modifyitems(config, items) you have all the test names ([item.name for item in items]) and if one of them contains array_api then you call _check_array_api_dispatch(True, misconfigured_scipy="raise").
Maybe a combination of both? Sorry I haven't had time to think this completely through through so I am mostly giving rough directions ...
| if os.environ.get("SCIPY_ARRAY_API") != "1": | ||
| warnings.warn( | ||
|
|
||
| if not scipy._lib._array_api.SCIPY_ARRAY_API: |
There was a problem hiding this comment.
Since is this private API I would do something link:
try:
scipy_array_api_enabled = scipy._lib._array_api.SCIPY_ARRAY_API
except AttributeError:
warnings.warn(
"Could not inspect if scipy array API support is enabled or not, assuming it is."
)
scipy_array_api_enabled = True
if not scipy_array_api_enabled:
message = ...Otherwise, a future version of scipy where array API support is enabled by default and does not define this private attribute anymore might break scikit-learn.
This means tests will not run and raise an error if one of the (possibly) optional dependencies is not installed or configured properly.
The idea is to keep the check flexible when it is used to check if we can run user code, but make it strict when running unit tests. For user code we don't know if we need the array API features of SciPy, so setting the environment variable is recommended but not required. For our unit tests we know we will test code that needs it and the warning is easily lost (by default hidden) and the error message you see regarding the failed test is not helpful for tracking down what you need to do to fix it.
Follow up to #29549