ENH: Replace `_lazywhere` with `xpx.apply_where` #22557

crusaderky · 2025-02-19T14:08:09Z

Adds support for jax.jit
Adds support for Dask
Adds support for non-scalar fill_value
Slightly reworked, hopefully improved UI
Reverts workaround introduced by TST: stats: test support for array API compatible masked arrays #22393 to full and similar should accept 0-d masked array as input mdhaber/marray#89. Unit tests remain all green. @mdhaber

mdhaber · 2025-02-19T15:48:00Z

Note that marray tests do not run in CI

They should be running in the array API job because we install marray there. Is that not happening?

mdhaber · 2025-02-19T15:56:14Z

To reduce the diff and potential for regressions, can we move the existing _lazywhere to stats._distn_infrastructure and use it there (with a note to use it only in the old distribution infrastructure and old distributions)? I don't foresee that code being updated for array API. I might give old distribution private methods array API support so they can be used with the new infrastructure, and at that time, I would swap out _lazywhere with apply_where.

8000

crusaderky · 2025-02-19T16:04:58Z

scipy/stats/_stats_py.py

@@ -670,7 +668,7 @@ def tmean(a, limits=None, inclusive=(True, True), axis=None):
    # explicit dtype specification required due to data-apis/array-api-compat#152
    sum = xp.sum(a, axis=axis, dtype=a.dtype)
    n = xp.sum(xp.asarray(~mask, dtype=a.dtype), axis=axis, dtype=a.dtype)
-    mean = _lazywhere(n != 0, (sum, n), xp.divide, xp.nan)
+    mean = xpx.apply_where(n != 0, operator.truediv, (sum, n), fill_value=xp.nan)


Explanation here: https://github.com/data-apis/array-api-extra/pull/141/files#r1961964801

Haven't studied the reason yet, but change looks fine.

crusaderky · 2025-02-19T16:11:48Z

Note that marray tests do not run in CI

They should be running in the array API job because we install marray there. Is that not happening?

You're correct, my bad.

crusaderky · 2025-02-19T16:20:48Z

To reduce the diff and potential for regressions, can we move the existing _lazywhere to stats._distn_infrastructure and use it there

To clarify, are you asking me to revert to _lazywhere in

scipy.stats._continuous_distns
scipy.stats._discrete_distns
scipy.stats._distn_infrastructure

and leave xpx.apply_where in the rest of scipy.stats?

scipy.stats._levy_stable
scipy.stats._multivariate
scipy.stats._stats_py
scipy.stats._wilcoxon

(with a note to use it only in the old distribution infrastructure and old distributions)?

Sorry I'm not familiar with what you mean with "old" versus "new" infrastructure - could you explain, give me a pointer to the documentation, or give me the exact verbiage?

mdhaber · 2025-02-19T16:31:58Z

Thanks for asking. Yeah - very close. scipy.stats._continuous_distns, scipy.stats._discrete_distns, scipy.stats._distn_infrastructure, and scipy.stats._levy_stable. and scipy.stats._multivariate are "old" in contrast to scipy.stats._distribution_infrastructure, scipy.stats_probability_distribution, and scipy.stats._new_distributions (which need to be renamed at some point). So originally I would have asked for those to be left alone.

But you know what - since you've already done the work, let's just revert the changes in scipy.stats._distn_infrastructure, move _lazywhere there, and restrict use of _lazywhere to that. I'll review the rest of the changes here.

Never mind. Let me review more closely. If it's all very systematic, we can just change it all. I thought there were more changes in _distn_infrastructure, which is the one I really wanted to avoid, but there are only a few.

scipy/stats/_multivariate.py

scipy/stats/_stats_py.py

mdhaber · 2025-02-19T16:52:04Z

scipy/stats/_discrete_distns.py

+            (r != 0) | (k != 0),
+            lambda k, M, n, r:
+                (-betaln(k+1, r) + betaln(k+r, 1)
+                 - betaln(n-k+1, M-r-n+1) + betaln(M-r-k+1, 1)
+                 + betaln(n+1, M-n+1) - betaln(M+1, 1)),
+             (k, M, n, r), fill_value=0.0)


One thing that is confusing me while reviewing these is that the third positional argument can be the arguments when fill_value is used, but the third positional argument is f2 elsewhere. In these case, please consider https://github.com/data-apis/array-api-extra/pull/141/files#r1962054546 or using keywords.

scipy/_lib/tests/test__util.py

mdhaber

I reviewed most files and it all looks pretty good, but I'd like to address https://github.com/scipy/scipy/pull/22557#discussion_r1962042105/https://github.com/data-apis/array-api-extra/pull/141/files#r1962054546 before finishing up. Also, I'd feel more comfortable with the change if the test with hypothesis were run on apply_where (e.g. if it were brought over to array-api-extra). That was written pretty carefully IIRC.

scipy/_lib/_util.py

rgommers

This does look quite good to me. One minor question inline. I agree with @mdhaber's review - can be merged once xpx.apply_where is in.

Slightly reworked, hopefully improved UI

Yep I think so!

rgommers · 2025-03-18T04:28:29Z

scipy/stats/_stats_py.py

@@ -692,11 +689,11 @@ def tmean(a, limits=None, inclusive=(True, True), axis=None):
    # explicit dtype specification required due to data-apis/array-api-compat#152
    sum = xp.sum(a, axis=axis, dtype=a.dtype)
    n = xp.sum(xp.asarray(~mask, dtype=a.dtype), axis=axis, dtype=a.dtype)
-    mean = _lazywhere(n != 0, (sum, n), xp.divide, xp.nan)
+    mean = xpx.apply_where(n != 0, (sum, n), operator.truediv, fill_value=xp.nan)


Did xp.divide not work?

See data-apis/array-api-extra#160:

If you forget about the meta-namespace and just use xp in the lambdas, at the moment most things will keep working.
This is because accidentally several functions in the dask.array, numpy, and cupy namespaces are interoperable or even the same function. However you will find cases where this doesn't hold true and you need the correct namespace.

This will become a much bigger source of headaches in the future when dask around generic Array API compatible namespaces will become commonplace (note: Dask does NOT support them today).

This pattern repeats itself many, many times in scipy. At the moment there are only a handful of cases that are array API-aware, and they all use xp.divide, so the problem can be worked around by replacing it with operator.truediv. But if you look at scipy.stats in #22557 you'll find a miriad of calls to np. functions inside the lambdas.

Ah yes, I remember that now. Given that the code now looks a little harder to understand and this trick won't actually work in other places, I'd probably keep it unchanged and have an issue for solving this problem - and just fail the Dask test in the meantime.

If that's too much churn and this one-line change is more pragmatic, then fine with me as well of course. Dask just has a bunch of issues with namespacing, and this feels like a hack that happens to work.

AFAIK operator.truediv and xp.divide are identical though? It doesn't feel harder to read to me before or after the change?

crusaderky · 2025-03-20T08:41:09Z

Test failures are unrelated. @mdhaber this is ready to merge!

crusaderky requested review from person142, steppi and andyfaff as code owners February 19, 2025 14:08

github-actions bot added scipy.stats scipy.special scipy.optimize scipy._lib labels Feb 19, 2025

crusaderky changed the title ~~DNM Replace _lazy_apply with xpx.apply_where~~ DNM Replace _lazywhere with xpx.apply_where Feb 19, 2025

lucascolley requested review from mdhaber and removed request for andyfaff, steppi and person142 February 19, 2025 15:38

lucascolley added maintenance Items related to regular maintenance tasks array types Items related to array API support and input array validation (see gh-18286) labels Feb 19, 2025

crusaderky force-pushed the apply_where branch from 426254a to 226a055 Compare February 19, 2025 15:39

crusaderky mentioned this pull request Feb 19, 2025

ENH: apply_where (migrate lazywhere from scipy) data-apis/array-api-extra#141

Merged

crusaderky commented Feb 19, 2025

View reviewed changes

crusaderky marked this pull request as draft February 19, 2025 16:21

mdhaber reviewed Feb 19, 2025

View reviewed changes

scipy/stats/_multivariate.py Outdated Show resolved Hide resolved

mdhaber reviewed Feb 19, 2025

View reviewed changes

scipy/stats/_stats_py.py Outdated Show resolved Hide resolved

mdhaber reviewed Feb 19, 2025

View reviewed changes

scipy/_lib/tests/test__util.py Show resolved Hide resolved

mdhaber reviewed Feb 19, 2025

View reviewed changes

8000

mdhaber reviewed Feb 19, 2025

View reviewed changes

scipy/_lib/_util.py Show resolved Hide resolved

crusaderky force-pushed the apply_where branch from 226a055 to 61bd874 Compare February 19, 2025 18:45

crusaderky added 6 commits March 12, 2025 11:45

Merge branch 'main' into apply_where

e8ee09a

fix

5e4c8b6

lint

afc0b99

xp_capabilities semplification

38fbb9c

Merge branch 'main' into apply_where

c4f9c64

Bump xpx

de84337

crusaderky force-pushed the apply_where branch from 24dc569 to de84337 Compare March 17, 2025 16:33

bump xpx

aedb08f

rgommers approved these changes Mar 18, 2025

View reviewed changes

crusaderky mentioned this pull request Mar 18, 2025

Dask meta-namespace in apply_where data-apis/array-api-extra#160

Closed

crusaderky added 2 commits March 18, 2025 23:18

Merge branch 'main' into apply_where

a665c45

Bump xpx

abc5959

crusaderky changed the title ~~[DNM] ENH: Replace _lazywhere with xpx.apply_where~~ ENH: Replace _lazywhere with xpx.apply_where Mar 18, 2025

crusaderky marked this pull request as ready for review March 18, 2025 23:22

crusaderky added 6 commits March 19, 2025 13:02

MAINT: bump array-api-extra to 0.7.0

f60aed7

Merge branch 'main' into apply_where

e705372

Merge branch 'xpx_bump' into apply_where

775e4c1

fix

28624b1

Merge branch 'xpx_bump' into apply_where

5c3aa4a

Merge branch 'main' into apply_where

7f54bf4

mdhaber mentioned this pull request Mar 19, 2025

marray fails with xpx.apply_where and xpx.at mdhaber/marray#97

Open

lucascolley added this to the 1.16.0 milestone Mar 20, 2025

mdhaber approved these changes Mar 22, 2025

View reviewed changes

mdhaber merged commit 0a31577 into scipy:main Mar 22, 2025
39 of 41 checks passed

crusaderky deleted the apply_where branch March 23, 2025 18:38

mathause mentioned this pull request Mar 24, 2025

scipy removed _lazywhere statsmodels/statsmodels#9542

Closed

runtingt mentioned this pull request Jun 23, 2025

Scipy removed _lazywhere scikit-hep/coffea#1355

Closed

lgray mentioned this pull request Jun 23, 2025

fix: _lazywhere was removed from scipy, use apply_where from scipy._lib.array_api_extra scikit-hep/coffea#1356

Merged

jeongyoonlee mentioned this pull request Jun 28, 2025

Move calibrate() inside the PropensityModel class uber/causalml#839

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Replace `_lazywhere` with `xpx.apply_where` #22557

ENH: Replace `_lazywhere` with `xpx.apply_where` #22557

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ENH: Replace _lazywhere with xpx.apply_where #22557

ENH: Replace _lazywhere with xpx.apply_where #22557

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ENH: Replace `_lazywhere` with `xpx.apply_where` #22557

ENH: Replace `_lazywhere` with `xpx.apply_where` #22557