-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Make the array API check strict for unit tests #29571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This means tests will not run and raise an error if one of the (possibly) optional dependencies is not installed or configured properly.
sklearn/utils/_testing.py
Outdated
@@ -295,7 +295,7 @@ def set_random_state(estimator, random_state=0): | |||
|
|||
|
|||
try: | |||
_check_array_api_dispatch(True) | |||
10000 | _check_array_api_dispatch(True, strict=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means we have to be careful about not importing sklearn.utils._testing
in any non-testing code. (I know we have done this in the past)
A safer route is to call _check_array_api_dispatch(True, strict=True)
somewhere in conftest.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was already here, so should we move it or just make a note of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On main
, it was okay because _check_array_api_dispatch
only raised warnings.
I was thinking of leaving this part alone and add an additional _check_array_api_dispatch(True, misconfigured_scipy="raise")
in conftest.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also like this better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pondered this for a bit. I couldn't work out a good place to add it to in conftest.py
as I couldn't find a good place where you know that "this is a test that requires array API, so lets run _check_array_api_dispatch()
". Then I realised that this is also true for the current solution in _testing.py
. It will raise an exception even if you never planned to use the array API.
So far we only skip tests when something is misconfigured, we don't raise an error/stop test execution.
Now I think the place to add _check_array_api_dispatch(True, misconfigured_scipy="raise")
is in _array_api_for_tests
, after checking for array_api_compat. Because I think this is a place where we know that "the user wants to run array API tests". And we will not reach this bit of code when the user doesn't want to run the array API tests.
The problem is, almost all array API related tests use _array_api_for_tests
, but not all of them. It looks like the ones that don't use it use @skip_if_array_api_compat_not_configured
but also explicitly test something lower level about the numpy wrapper/array API machinery. So probably fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if the simplest thing to do is not have a conftest.py
(not sklearn/conftest.py
this is too late scipy
has already been imported at this stage) that sets SCIPY_ARRAY_API=1
as mentioned at the end of #29549 (comment). This will likely not work when we run pytest --pyargs sklearn
outside of the repo root (conftest.py
will not be found), but this may be an acceptable solution during the scipy array API transition.
Alternative with the sklearn/conftest.py
route which I kind of prefer too, in part because in my experience pytest is flexible enough that you can manage to do a reasonable thing in finite-time. I think something reasonable although a bit hacky would be the following:
In pytest_collection_modifyitems(config, items)
you have all the test names ([item.name for item in items]
) and if one of them contains array_api
then you call _check_array_api_dispatch(True, misconfigured_scipy="raise")
.
Maybe a combination of both? Sorry I haven't had time to think this completely through through so I am mostly giving rough directions ...
if os.environ.get("SCIPY_ARRAY_API") != "1": | ||
warnings.warn( | ||
|
||
if not scipy._lib._array_api.SCIPY_ARRAY_API: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since is this private API I would do something link:
try:
scipy_array_api_enabled = scipy._lib._array_api.SCIPY_ARRAY_API
except AttributeError:
warnings.warn(
"Could not inspect if scipy array API support is enabled or not, assuming it is."
)
scipy_array_api_enabled = True
if not scipy_array_api_enabled:
message = ...
Otherwise, a future version of scipy where array API support is enabled by default and does not define this private attribute anymore might break scikit-learn.
This means tests will not run and raise an error if one of the (possibly) optional dependencies is not installed or configured properly.
The idea is to keep the check flexible when it is used to check if we can run user code, but make it strict when running unit tests. For user code we don't know if we need the array API features of SciPy, so setting the environment variable is recommended but not required. For our unit tests we know we will test code that needs it and the warning is easily lost (by default hidden) and the error message you see regarding the failed test is not helpful for tracking down what you need to do to fix it.
Follow up to #29549