8000 Follow-up after mean_poisson_deviance array API PR · Issue #29549 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Follow-up after mean_poisson_deviance array API PR #29549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lesteve opened this issue Jul 23, 2024 · 9 comments
Closed

Follow-up after mean_poisson_deviance array API PR #29549

lesteve opened this issue Jul 23, 2024 · 9 comments

Comments

@lesteve
Copy link
Member
lesteve commented Jul 23, 2024

As a follow-up for #29227.

The following command fails locally:

pytest -vl sklearn/metrics/tests/test_common.py -k 'api_regression_metric and mean_poisson_deviance'

see full error in #29227 (comment)

but setting SCIPY_ARRAY_API=1 works:

SCIPY_ARRAY_API=1 pytest -vl sklearn/metrics/tests/test_common.py -k 'api_regression_metric and mean_poisson_deviance'

Honestly I think this is fair to say that as a newcomer to the array API work, my feeling is that it can break way too easily in very mysterious ways. There are probably a few unrelated reasons for this but I guess since at the last bi-weekly meetings we agreed that we should try to pay attention to this kind of breakages and improve the situation, here are a few comments on this #29277:

  • I think we should have some tests with scipy>=1.14 and SCIPY_ARRAY_API unset. This seems something that could easily happen in practice and we should have tests that make sure that nothing breaks. It's not clear to me why only array-api-strict breaks and not pytorch example, maybe as the name implies array-api-strict is even stricter than other namespaces.
  • I don't think this is acceptable to skip the test for scipy<1.14 since you know a reasonable user may use array_api_dispatch=True with scipy<1.14. To be fair it is not clear to me what happens in this case. I guess scipy will transform the array into a numpy array, and maybe the code afterwards has issues because some arrays are not in the same namespace? Anyway, we should make sure that the user code does not break. If this is really too hard to achieve the alternative is to raise an exception saying something like "to enable array API dispatch with this particular function you need scipy 1.14 and SCIPY_ARRAY_API=1".
  • the warning about SCIPY_ARRAY_API seems too broad (as soon as set_config(array_api_dispatch=True) is called?). Could we not have warnings only in a few (at least for now) places where we know we call scipy functions?

Originally posted by @lesteve in #29227 (comment)

@github-actions github-actions bot added the Needs Triage Issue requires triage label Jul 23, 2024
@lesteve lesteve changed the title Follow-up after mean_poisson_deviance PR array API PR Follow-up after mean_poisson_deviance array API PR Jul 23, 2024
@lesteve lesteve added Array API and removed Needs Triage Issue requires triage labels Jul 24, 2024
@betatim
Copy link
Member
betatim commented Jul 24, 2024

We should probably add a check to our array API tests that raises an error/exception with a good message if the environment variable is not already set.

@betatim
Copy link
Member
betatim commented Jul 24, 2024
  • the warning about SCIPY_ARRAY_API seems too broad (as soon as set_config(array_api_dispatch=True) is called?). Could we not have warnings only in a few (at least for now) places where we know we call scipy functions?

I assume you mean

if os.environ.get("SCIPY_ARRAY_API") != "1":
warnings.warn(
(
"Some scikit-learn array API features might rely on enabling "
"SciPy's own support for array API to function properly. "
"Please set the SCIPY_ARRAY_API=1 environment variable "
"before importing sklearn or scipy. More details at: "
"https://docs.scipy.org/doc/scipy/dev/api-dev/array_api.html"
),
UserWarning,
)

I think it is fine to have a warning for everything. Especially as scipy plan to remove the need for setting this environment variable. So adding logic to our code only when needed feels like a lot of effort.

Did you see the warning when you ran the tests? Or is it lost in a sea of warnings? Asked differently, why did the warning we already have not prevent your negative experience when running the tests?

One thing we should add IMHO is a check on the minimum version of scipy that the user has installed. If your scipy version is too old, enabling this environment variable won't do anything for you. The newer the installed scipy version is, the more of it will support the array API. So I think this means we require a minimum version for scikit-learn to work. Which means we need a CI setup with that version to make sure we don't rely on things in scipy that are "too new". Should we pick scipy 1.14 as the minimum?

  • I think we should have some tests with scipy>=1.14 and SCIPY_ARRAY_API unset. This seems something that could easily happen in practice and we should have tests that make sure that nothing breaks.

For these tests array API support in scikit-learn would be turned off right?

  • I don't think this is acceptable to skip the test for scipy<1.14 since you know a reasonable user may use array_api_dispatch=True with scipy<1.14. To be fair it is not clear to me what happens in this case.

I think we should pick a minimum scipy version that you need to use array API. Mostly because "who knows what will happen" if people use an older one. We should implement a check like we have for Numpy in _check_array_api_dispatch for this minimum version

@lesteve
Copy link
Member Author
lesteve commented Jul 24, 2024

Did you see the warning when you ran the tests? Or is it lost in a sea of warnings? Asked differently, why did the warning we already have not prevent your negative experience when running the tests?

No I did not see the warnings because there wasn't any, see below for more details why.

I guess as discussed during the meeting, reducing the number of moving pieces would be a great start, so raising an exception when any of the following conditions hold seems probably like a good idea?

  • array-API support enabled, scipy<1.14
  • array-API support enabled, scipy>=1.14 but scipy._lib._array_api.SCIPY_ARRAY_API is False (Python variable which is set at import time based on the value of the environment variable SCIPY_ARRAY_API see https://github.com/scipy/scipy/blob/9525f45c1cb9068bcb8e02d8fd81abc0c6f4e4ae/scipy/_lib/_array_api.py#L31-L32). Not sure if the env variable is used in other places, but the Python variable is what we should always use I think because that's what is the source of truth to allow Scipy array API support, not the environment variable which can be changed later after scipy has been imported.

Why did I not see any warning?

  • it seems like warnings are not shown by default in our pytest setup, we have --disable-pytest-warnings in our setup.cfg (guess who did that, yours truly, 7+ years ago with a note saying to remove it, I guess it may be about time 😅).
  • to be honest I may have missed it anyway see screenshot below, it's very easy to get drawn by the red lines and ignore the "warnings summary" part. Here I am only running 4 tests but if I was running the full test suite forget about finding the relevant warning 😉
    image

@betatim
Copy link
Member
betatim commented Jul 24, 2024

I agree that the warning is essentially invisible. I knew it was somewhere in your screenshot but it took me two or three scans to actually find it :-/

Should we make the warning an error for tests?

@lesteve
Copy link
Member Author
lesteve commented Jul 24, 2024

Should we make the warning an error for tests?

This is certainly an option to make it an error only for tests. I guess you are implying that making it an error in the general case seems too aggressive?

Also unrelated, but as a developer I find it slightly annoying that now I have to know that I need to set SCIPY_ARRAY_API=1 otherwise some tests will fail (I have installed array-api-strict in my main scikit-learn dev environment because I feel like I am more likely to catch array API issues 😅). Maybe it is an argument for skipping the mean_poisson_deviance test if scipy array API support is not enabled. Maybe it's not worth it and I can also probably find a way to tweak my setup for this ...

Edit: for now I have found a work-around which is to have a untracked conftest.py at the root of the folder. I leave it untracked so I can switch easily between different branches and still have it:

import os

try:
    import array_api_strict
    os.environ["SCIPY_ARRAY_API"] = "1"
except ImportError:
    pass

@ogrisel
Copy link
Member
ogrisel commented Jul 26, 2024

Maybe it is an argument for skipping the mean_poisson_deviance test if scipy array API support is not enabled.

I am fine with this. This would mean catching RuntimeError in addition to ImportError in sklearn.utils._testing in #29571.

@betatim
Copy link
Member
betatim commented Jul 26, 2024

Having to know to set the environment variable is annoying indeed. Some might suggest that spin test could take care of this for us! But I think this is also exactly what I think is an anti-pattern for tooling like spin (we'd be using it as a crutch to deal with things which are not done well).

I wonder if there is some Python magic (but not too magic) we can sprinkle onto scipy to enable array API support after having imported it?

If we skip all tests that require array API support in scipy, won't we end up skipping a lot of tests in the future? Basically, I am wondering what is going to be more annoying: having to setup things like the env variable or learning that your work breaks lots of tests only after opening a PR?

@ogrisel
Copy link
Member
ogrisel commented Sep 9, 2024

I opened an issue upstream in scipy to ask for a programmatic way to do it: scipy/scipy#21529.

@ogrisel
Copy link
Member
ogrisel commented Sep 13, 2024

Closing this because in #29814.

@ogrisel ogrisel closed thi 4F73 s as completed Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants
0