8000 DOC/TST: `cluster.hierarchy`: use `xp_capabilities` by crusaderky · Pull Request #22960 · scipy/scipy · GitHub
[go: up one dir, main page]

Skip to content

DOC/TST: cluster.hierarchy: use xp_capabilities #22960

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 17, 2025

Conversation

crusaderky
Copy link
Contributor
@crusaderky crusaderky commented May 9, 2025

Add xp_capabilities to scipy.cluster.hierarchy.

See Also

Demo render

image

@github-actions github-actions bot added scipy.stats scipy.special scipy.cluster scipy._lib scipy.constants array types Items related to array API support and input array validation (see gh-18286) Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org maintenance Items related to regular maintenance tasks labels May 9, 2025
@crusaderky crusaderky added this to the 1.16.0 milestone May 9, 2025
@crusaderky crusaderky force-pushed the cluster_capabilities branch from f443522 to 8773ce9 Compare May 9, 2025 14:58
@crusaderky crusaderky changed the title DOC/TST: cluster: use xp_capabilities [DNM] DOC/TST: cluster: use xp_capabilities May 9, 2025
@crusaderky crusaderky marked this pull request as ready for review May 9, 2025 14:59
@crusaderky crusaderky force-pushed the cluster_capabilities branch 6 times, most recently from fb3bf78 to 9cbcf60 Compare May 16, 2025 18:05
@crusaderky crusaderky changed the title [DNM] DOC/TST: cluster: use xp_capabilities [DNM] DOC/TST: cluster.hierarchy: use xp_capabilities May 16, 2025
@crusaderky crusaderky force-pushed the cluster_capabilities branch from 9cbcf60 to 003d311 Compare May 16, 2025 21:53
@crusaderky crusaderky changed the title [DNM] DOC/TST: cluster.hierarchy: use xp_capabilities DOC/TST: cluster.hierarchy: use xp_capabilities May 16, 2025
@@ -299,9 +267,10 @@ def test_mlab_linkage_conversion_multiple_rows(self, xp):
rtol=1e-15)


@skip_xp_backends(cpu_only=True)
@make_xp_test_case(fclusterdata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I noticed the addition of make_xp_test_case, but FWIW I find this much harder to parse intuitively than skip_xp_backends.

Things are getting pretty convoluted IMO.

Copy link
Contributor Author
@crusaderky crusaderky May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're coming at the back end of 6 months worth of iterations to make it as user-friendly as possible, given the requirements.

skip_xp_backends was no longer fit for purpose given the necessity to

  • keep documentation and tests aligned
  • have a central repository of which functions work on which backends
  • test lazy backends (jax.jit and Dask materialization)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, just pointing out my own difficulty in parsing what is going on FWIW.

Copy link
Contributor
@steppi steppi May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was going to say, I've spent time this week studying xp_capabilities to try to understand all of the ins and outs, and it solves very real problems. @tylerjereddy don't think of it as skip_xp_backends becoming convoluted or bloated through poor planning or something, it's more that this is bringing to attention real complexity in the problem space. Maintaining a single source of truth for which functions work on which backends and having the test cases drawn from that source should make things simpler as a whole.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think things are getting a bit complex and we should be cautious here WRT maintenance burden. It does seem like the promise of a single code path for all backends has melted away for the time being and the complexity creep is something we should keep an eye on.

IIRC adding support tables to every function individually vs. a single summary location is something else that I may have missed the discussion on. Anyway, I do think that the volume of work and its complexity mean that it can be hard for folks to keep up and chime in without being simply told they don't understand so just want to make sure we're cautious.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, just pointing out my own difficulty in parsing what is going on FWIW.

Noted. I agree there's room to document and explain more thoroughly.

Copy link
Contributor
@steppi steppi May 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC adding support tables to every function individually vs. a single summary location is something else that I may have missed the discussion on.

I'd missed the discussion too. It looks like it took place here, #21255. I wasn't able to find a related thread on the discourse but seems like there probably should have been one.

I had a lot of bandwidth to look into this over the past week, and following along with all of the subsequent PRs, I was able to come to grips on why things have taken the form they did, but most maintainers don't have that luxury and it must be a lot to have sprung on you at once. I think the general idea is not so complicated as it appears. There's a decorator which let's you specify what is and isn't supported, which takes functions and adds entries for them to a central capabilities_table, and also adds a table to the docs. The pytest decorator for selecting different backends then draws from this central table. There are also changes can be seen as orthogonal that address Jax and Dask's capabilities for lazy computation, which probably contribute to making the changes seem a little overwhelming at first look.

I don't think it's good for you to simply be told you don't understand either. Sorry for doing that. It's unfortunate that the discussion may have gone under the radar of more than a few maintainers. Having looked into it recently though, I've been convinced this is good stuff.

Copy link
Member
@lucascolley lucascolley May 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem like the promise of a single code path for all backends has melted away for the time being

Is this referring to paths which are added or skipped based on is_lazy_array(x)? Or are there other path splits?


We really need to update https://scipy.github.io/devdocs/dev/api-dev/array_api.html, but I haven't insisted on keeping it up to date at every stage due to the high iteration frequency. Now that we seem to be settling down to a pattern which covers what we need, it would be great to see it well documented for longevity.

Copy link
Member
@rgommers rgommers May 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need to update https://scipy.github.io/devdocs/dev/api-dev/array_api.html, but I haven't insisted on keeping it up to date at every stage due to the high iteration frequency. Now that we seem to be settling down to a pattern which covers what we need, it would be great to see it well documented for longevity.

💯 on this. I think it's great how much we collectively got done over the past 6-12 months, and also that now (after the last PRs go into 1.16.x) is a great time to let the dust settle and focus for a bit on docs and review of code patterns, to ensure we all get back on the same page and are happy with how this has shaped up. I personally also had a hard time keeping up with all the changes, even though I had a lot of context.

I'll try and write a summary on gh-18286 and then post it on Discourse after I'm done with 1.16.x review duties.

EDIT: done in #18867 (comment) and this Discourse post

Copy link
Member
@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite good, the tables docs in the docs render well and look correct. Some local testing results from pixi r test-cpu -t scipy.cluster.tests.test_hierarchy to verify we didn't lose test coverage or performance:

  • CPU on main: 770 passed, 41 skipped in 42.12s
  • CPU on this PR: 770 passed, 41 skipped in 42.40s
  • GPU on main: 437 passed, 374 skipped in 29.92s
  • GPU on this PR: 437 passed, 374 skipped in 31.81s

So all works as advertised. Thanks @crusaderky & reviewers.

@rgommers
Copy link
Member

CI failures are unrelated, so in it goes.

@rgommers rgommers merged commit d9fa646 into scipy:main May 17, 2025
35 of 39 checks passed
@crusaderky crusaderky deleted the cluster_capabilities branch May 17, 2025 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
array types Items related to array API support and input array validation (see gh-18286) Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org maintenance Items related to regular maintenance tasks scipy.cluster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0