8000 MAINT: special: factorial clean-ups by h-vetinari · Pull Request #19989 · scipy/scipy · GitHub
[go: up one dir, main page]

Skip to content

MAINT: special: factorial clean-ups #19989

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Oct 14, 2024
Merged

Conversation

h-vetinari
Copy link
Member

Preparation for complex support in factorial functions (see #18409), which needs some preparations in the tests, as well as an auxiliary function _is_subdtype to be able to sanely express the necessary subtype conditions.

No functional changes in this PR, aside from changing the text of some warning messages. Also tightens some tests that were using the woefully inadequate default of rtol=1e-7 in assert_allclose

Preview of what this PR is preparing for can be found in #19988

@github-actions github-actions bot added scipy.special maintenance Items related to regular maintenance tasks labels Jan 28, 2024
@h-vetinari
Copy link
Member Author

The failures here are just minor tolerance violations (because I tightened things from the very loose default of 1e-7 of assert_allclose) - I'll bump the tolerance as necessary once there are any review requests.

@lucascolley lucascolley removed the request for review from person142 February 2, 2024 11:14
@h-vetinari
Copy link
Member Author

Just saw numpy/numpy#24680 & numpy/numpy#24770 in the numpy 2.0 release notes. That's even nicer (and IMO enough if we're testing this on numpy 2.0 only). Thanks a lot @mdhaber! 🙏

< 8000 !-- -->

@lucascolley lucascolley changed the title MAINT: factorial clean-ups MAINT: special: factorial clean-ups May 18, 2024
@h-vetinari
Copy link
Member Author

Just saw numpy/numpy#24680 & numpy/numpy#24770 in the numpy 2.0 release notes. That's even nicer (and IMO enough if we're testing this on numpy 2.0 only). Thanks a lot @mdhaber! 🙏

@mdhaber, so I picked this up again now that numpy 2.0 and thus the strict-mode in the assert-functions is available, but it is unfortunately not enough for the requirements here, in particular it still doesn't distinguish a scalar from a 0d array:

>>> import numpy as np
>>> from numpy.testing import assert_equal
>>> assert_equal(1, np.array(1, dtype=np.int64), strict=True)  # passes?!?

This is marginal improvement over numpy/numpy#24050 - i.e. the following now fails correctly:

>>> assert_equal(1, np.array(1.0, dtype=np.int64), strict=True)  # now fails ✅
                              ^^
                              !!

- but for the factorial functions we expressly want to ensure that we do not mix up scalars and 0d-arrays, which came out of #15600 and has an outstanding RFC for wider application.

The take-away I take from this is that I need to keep the assert_really_equal function here to ensure coverage of this fact (plus the complex cases that will be added in the course towards #19988). If/once numpy provides the required strictness, I'd be more than happy to rip this out, but for now I do not want to pessimize the testing here, and consider the assert_really_equal-function an acceptable tool to achieve it.

Please let me know your thoughts. CC @steppi

@mdhaber
Copy link
Contributor
mdhaber commented Jun 22, 2024

Use xp_assert_close with the accept_0d=True option. Never mind. I didn't remember the context. Use whatever you need. I don't remember what happened with the NumPy strict checks. I tried to make them more widely available in the other assert functions, but I don't think I made them stricter than how they were originally written. xp_assert_close with accept_0d=True will make this distinction between 0d and scalar, but I don't know what it will do for separate real and complex NaNs.

@h-vetinari
Copy link
Member Author

xp_assert_close with accept_0d=True will make this distinction between 0d and scalar

Does it make sense to use xp_assert_close with numpy arrays exclusively? If so, I don't mind switching to that (though I cannot tell from accept_0d= if I'd want True or False to error on - rather than "accept" - divergence between 0d arrays and scalars).

I'm happy to deal with the complex corner case separately (even if that means keeping as 8000 sert_really_equal as a thin wrapper around xp_assert_close + the complex bits.)

@mdhaber
Copy link
Contributor
mdhaber commented Jun 22, 2024

With accept_0d=True, actual and desired need to both be either 0d arrays or scalars, not mixed, to pass.

accept_0d=False is a behavior that insists that the result be a scalar to pass (because following NumPy's lead, that is the typical behavior of the functions we've translated so far).

We can change the name/value accept_0d=False to require_scalar=True if that would help. I know that the non-default is used in only one place right now, so easy to change.

I have no opinion about whether xp_assert_close can be used solely with NumPy arrays. It doesn't bother me at all. I am happy with this PR using its own custom function if it does what it needs to do, or perhaps you can add the distinction between the various complex NaN cases to xp_assert_close.

@h-vetinari
Copy link
Member Author

Thanks for the context!

We can change the name to require_scalar if that would help.

I think that would be beneficial. With regard to the assert_* functions, the verb "require" makes much more sense to me than "accept".

I have no opinion about whether xp_assert_close can be used solely with NumPy arrays.

I wasn't waxing philosophical. As long as it works technically, that's fine from my POV.

or perhaps you can add the distinction between the various complex NaN cases to xp_assert_close.

Sure, we can try that.

@lucascolley
Copy link
Member

I have no opinion about whether xp_assert_close can be used solely with NumPy arrays.

I wasn't waxing philosophical. As long as it works technically, that's fine from my POV.

yep, go ahead!

@mdhaber
Copy link
Contributor
mdhaber commented Jun 22, 2024

Ok, would you open a PR that changes the name of the argument and adds the logic for complex NaNs? I think the only place that uses the non-default is _lib/tests/test_util.py test of _lazywhere.

Re: require_scalar - it would probably need to be more like require_scalar_if_0d to be accurate. Or since it's a private tool, we may not need the keyword name to document all the details, in which case accept_0d is OK, and we can just make sure that comments give the fine print. Or reject_0d might be the best combination of concision and accuracy.

@h-vetinari
Copy link
Member Author

Ok, would you open a PR that changes the name of the argument and adds the logic for complex NaNs?

Sure. I had mistakenly assumed that this is shared code in https://github.com/data-apis/array-api-compat, but if it's only in scipy proper (i.e. doesn't show up when searching in array-api-compat), then that's much easier of course.

@mdhaber
Copy link
Contributor
mdhaber commented Jun 22, 2024

Oh, great. Yes, the code is in _lib/_array_api.py. BTW the keyword I was referring to is allow_0d, not accept_0d. That's much easier to interpret, I think. Hopefully that resolves that issue. Here's how it behaves:

# Check `allow_0d`
message = "Types do not match:\n..."
with pytest.raises(AssertionError, match=message):
xp_assert_equal(xp.asarray(0.), xp.float64(0), allow_0d=True)
with pytest.raises(AssertionError, match=message):
xp_assert_equal(xp.float64(0), xp.asarray(0.), allow_0d=True)
xp_assert_equal(xp.float64(0), xp.float64(0), allow_0d=True)
xp_assert_equal(xp.asarray(0.), xp.asarray(0.), allow_0d=True)

One thing I thought I'd add relate 8000 d to the use in pure-NumPy contexts: it does not currently fail if the desired object is pure Python and the actual is NumPy. It does fail if the actual object is Python and the desired is NuMpy, though, and it does distinguish between other array backends and Python objects, so typically tests will fail if the actual backend is wrong. IIRC this just made the code a bit easier to write and respect all the check_ options independently, but it could probably be made more strict about this if desired. I think that would be a separate PR, though.

@h-vetinari
Copy link
Member Author
h-vetinari commented Jun 22, 2024

BTW the keyword I was referring to is allow_0d, not accept_0d. That's much easier to interpret, I think.

TBH, I don't think so. That same test fails further up with

# Check default convention: 0d arrays are not allowed
message = "Result is a NumPy 0d array. Many SciPy functions..."
with pytest.raises(AssertionError, match=message):
xp_assert_equal(xp.asarray(0.), xp.float64(0))
with pytest.raises(AssertionError, match=message):
xp_assert_equal(xp.asarray(0.), xp.asarray(0.))

I wasn't involved, but to me, the original problem is "disallowing" 0d-inputs in the first place, as if they could somehow not be compared. To me that's in contradiction of what an assert_* should do (i.e. determine if two inputs are sufficiently similar, not make judgements on "you passed the wrong type of input").

The allow_0d then is a bandaid that tries to work around the underlying issue.

If we want to err on the side of caution (as suggested by the full warning message), then we shouldn't fail on scalar-vs-0d mismatches (like numpy does). I fundamentally disagree with that direction (c.f. RFC I mentioned), but that would at least be consistent. And then there could be a strict_0d=True (or simply strict=True) keyword that actually enforces this.

PS. Or perhaps I misinterpreted what "caution" was intended - if we want to avoid 0d-outputs passing silently, then just make strict=True the default for xp_assert_equal

@mdhaber
Copy link
Contributor
mdhaber commented Jun 22, 2024

if we want to avoid 0d-outputs passing silently, then just make strict=True the default for xp_assert_equal

It is, and that is the default.

@h-vetinari
Copy link
Member Author
h-vetinari commented Jun 22, 2024

So would you be alright with changing xp_assert_equal from (default) allow_0d=False to strict=True (or strict_0d=True), and making it fully symmetrical (another crucial property of the assert_* family IMO)?

@lucascolley
Copy link
Member

I think strict is too vague - see also the discussion in gh-20959 @h-vetinari .

EDIT: yeah, strict_0d is better

@h-vetinari
Copy link
Member Author

yeah, strict_0d is better

Given all the check_* options in the signature (and the existence of a _strict_check helper), we might want to go with check_0d

@lucascolley
Copy link
Member
lucascolley commented Jun 23, 2024

I wasn't involved, but to me, the original problem is "disallowing" 0d-inputs in the first place, as if they could somehow not be compared. To me that's in contradiction of what an assert_* should do (i.e. determine if two inputs are sufficiently similar, not make judgements on "you passed the wrong type of input").

The allow_0d then is a bandaid that tries to work around the underlying issue.

I'm not quite following why making judgements on the types of the actual and desired arguments is "in contradiction of what an assert_* should do". The addition of stricter checks for types, shapes and dtypes was made exactly because it seemed like what we wanted the assert_*'s to be able to do.

Practically, the assertion is a testing utility which we use to make sure our functions are returning what we want them to return. The motivation for the current behaviour is in accordance with this purpose: trying to ensure that we return scalars where we are trying to. How we select when we are trying to return scalars and when we are trying to return arrays is a different question: a sensible default does seem to be the type of the desired argument, although that leaves open the possibility that devs forget to specify that they actually want a scalar, and thus accidentally return an array. Such a behavioural change is very easy to make accidentally during array API conversion, given that the first thing that happens to any scalar is that it gets passed to xp.asarray.

Of course, one can argue that we should always be returning arrays (which I agree with in principle), but the awkward situation we have with NumPy (as #21026 (comment) shows) means that this should be a deliberate decision that ideally we can make consistent project-wide. To be honest, if we are going to move to always returning arrays, it would really be nice for NumPy to make the first move.

Regarding properties of the assertions, such as self-identity and commutativity, I can see why these are intuitive properties of a function that helps us check that actual is somehow the same as desired. However, if it were the case that breaking these properties helps us to more easily ensure that our functions are returning what we want them to return, I don't see any problem in principle with doing so. Breaking intuition in a private testing utility seems to be a matter of ensuring adequate documentation and helpful error messages, rather than a complete no-go, if it is otherwise beneficial.

Given that there are places in SciPy where we want to return 0D arrays, and places where we want to return scalars (at least for now), I think that we very much do want options to enforce that actual is 0D or actual is scalar, regardless of the type of desired, as it will help catch potential developer error. While xp_assert_equal(a, a) ever returning False is certainly an unintuitive default, I don't think that we can assert that it is an incorrect default. I don't think that our goal with the private testing utility necessarily has to be to provide the most intuitive API - rather, the goal should be to help us test well and test easily. Those two goals may coincide, but I don't think it is quite so simple that, say, a lack of self-identity is something incorrect which needs to be fixed.


What would make our lives really easy is if there was a decision that across SciPy we always wanted to return either scalars or 0D arrays, but I'm not sure we're going to get that soon.

I think it would be useful to survey in more detail which parts of the codebase are supposed to return arrays vs which parts are supposed to return scalars. Then we can make a more informed decision on a sensible default. I think the risk of accidentally returning a scalar when we are supposed to return an array is quite low, but we should certainly be able to configure the assertion to catch either way.

@h-vetinari I think it would be useful for you to pursue your PR, giving the assertions the semantics you have in mind, so we can see what is needed to get CI passing. Then we can compare that with alternative solutions, such as maintaining the current default but adding alternative configurations and improving documentation and error messages.

@ilayn
Copy link
Member
ilayn commented Jun 23, 2024

The main objection is about equal not the assert part. Type checking in equalities is not about equalities which we use extensively in testing suite for a long time. Value equality is always has been the utility of these funcs. Not only that but also checking integers with integral floats. Type and array checking is injecting meta data into these assertions which I don't see any good reason to break the mental model for.

If there's a stricter test needed which is perfectly fine a new named one or keyword is fine. I still don't get why array api does not provide this instead.

@lucascolley
Copy link
Member

I wouldn't be opposed to a new name! I suppose we didn't see a need to deviate from the standard names, but there's nothing stopping us from doing so in principle.

@lucascolley
Copy link
Member

I thought there was no point in running CI until one of us transitions to using the xp-assertions

@h-vetinari
Copy link
Member Author

Ok, fair enough. Feel free to give this a shot, I'm low on time at the moment, so no promises until when I can get to it. Realistically all that should be necessary is removing assert_really_equal and replacing call sites with xp_assert_equal

Copy link
Member
@lucascolley lucascolley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @h-vetinari, this looks like a great clean-up now if CI is happy!

Comment on lines +2954 to +2987
def _is_subdtype(dtype, dtypes):
"""
Shorthand for calculating whether dtype is subtype of some dtypes.

Also allows specifying a list instead of just a single dtype.

Additionaly, the most important supertypes from
https://numpy.org/doc/stable/reference/arrays.scalars.html
can optionally be specified using abbreviations as follows:
"i": np.integer
"f": np.floating
"c": np.complexfloating
"n": np.number (contains the other three)
"""
dtypes = dtypes if isinstance(dtypes, list) else [dtypes]
# map single character abbreviations, if they are in dtypes
mapping = {
"i": np.integer,
"f": np.floating,
"c": np.complexfloating,
"n": np.number
}
dtypes = [mapping.get(x, x) for x in dtypes]
return any(np.issubdtype(dtype, dt) for dt in dtypes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good but we'll be able to replace it with just xp.isdtype when array API standard support is added

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to check out #19988 to see what I'm doing with this function in the end. :)

Not sure if that's part of the design for xp.isdtype or whether it should be, but it's certainly a useful thing to have, rather than other extremely verbose variants of the same thing.

(try writing out _is_subdtype(type(n), ["i", "f", "c"]) "by hand" to see what I mean)

Copy link
Member

yep, in the standard that would be spelled xp.isdtype(n.dtype, 'numeric') https://data-apis.org/array-api/draft/API_specification/generated/array_api.isdtype.html.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad example on my part then. Consider:

if _is_subdtype(type(n), ["f", "c"]):
if _is_subdtype(type(n), ["i", "f", "c", type(None)]):

or

# factorial allows floats also for extend="zero"
types_requiring_complex = "c" if fname == "factorial" else ["f", "c"]
if _is_subdtype(type(n), types_requiring_complex):

What I'm opposed to is writing out the full names like "real floating" all the time. For the kind of code that's necessary for the factorial functions, this leads to an unreadable mess. I think single letters are clear (enough) in context and the gained improvement in legibility is so substantial that I consider it a necessity.

However, that helper doesn't have to leak beyond special/_basic.py and I'm also perfectly happy to have _is_subdtype dispatch to xp.isdtype under the hood.

@lucascolley lucascolley added this to the 1.15.0 milestone Sep 17, 2024
@lucascolley
Copy link
Member

test failures for int dtypes on 32-bit and Windows

@h-vetinari
Copy link
Member Author

For windows it looks like we need to increase some tolerances, for 32bit there's a wrong expectation of int being 64bit somewhere. Shouldn't be too hard to fix

@h-vetinari h-vetinari force-pushed the factorial_cleanup branch 3 times, most recently from 2c25d19 to 97f8419 Compare October 12, 2024 13:44
@h-vetinari
Copy link
Member Author

@lucascolley, I think this is getting ready now. I hope I've now caught all the tolerance violations; there were two other minor failures that I'm quite confident are unrelated to this PR.

@h-vetinari
Copy link
Member Author

@lucascolley can I take your existing approval, or do you want to give this another look-over?

@lucascolley lucascolley merged commit da4b549 into scipy:main Oct 14, 2024
39 checks passed
@lucascolley
Copy link
Member

thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks scipy.special
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0