-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
MAINT: special: factorial clean-ups #19989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The failures here are just minor tolerance violations (because I tightened things from the very loose default of |
Just saw numpy/numpy#24680 & numpy/numpy#24770 in the numpy 2.0 release notes. That's even nicer (and IMO enough if we're testing this on numpy 2.0 only). Thanks a lot @mdhaber! 🙏 |
@mdhaber, so I picked this up again now that numpy 2.0 and thus the strict-mode in the assert-functions is available, but it is unfortunately not enough for the requirements here, in particular it still doesn't distinguish a scalar from a 0d array:
This is marginal improvement over numpy/numpy#24050 - i.e. the following now fails correctly:
- but for the factorial functions we expressly want to ensure that we do not mix up scalars and 0d-arrays, which came out of #15600 and has an outstanding RFC for wider application. The take-away I take from this is that I need to keep the Please let me know your thoughts. CC @steppi |
|
Does it make sense to use I'm happy to deal with the complex corner case separately (even if that means keeping |
With
We can change the name/value I have no opinion about whether |
Thanks for the context!
I think that would be beneficial. With regard to the assert_* functions, the verb "require" makes much more sense to me than "accept".
I wasn't waxing philosophical. As long as it works technically, that's fine from my POV.
Sure, we can try that. |
yep, go ahead! |
Ok, would you open a PR that changes the name of the argument and adds the logic for complex NaNs? I think the only place that uses the non-default is Re: |
Sure. I had mistakenly assumed that this is shared code in https://github.com/data-apis/array-api-compat, but if it's only in scipy proper (i.e. doesn't show up when searching in array-api-compat), then that's much easier of course. |
Oh, great. Yes, the code is in scipy/scipy/_lib/tests/test_array_api.py Lines 122 to 129 in 628a6b3
One thing I thought I'd add relate
8000
d to the use in pure-NumPy contexts: it does not currently fail if the |
TBH, I don't think so. That same test fails further up with scipy/scipy/_lib/tests/test_array_api.py Lines 113 to 118 in 628a6b3
I wasn't involved, but to me, the original problem is "disallowing" 0d-inputs in the first place, as if they could somehow not be compared. To me that's in contradiction of what an The If we want to err on the side of caution (as suggested by the full warning message), then we shouldn't fail on scalar-vs-0d mismatches (like numpy does). I fundamentally disagree with that direction (c.f. RFC I mentioned), but that would at least be consistent. And then there could be a PS. Or perhaps I misinterpreted what "caution" was intended - if we want to avoid 0d-outputs passing silently, then just make |
It is, and that is the default. |
So would you be alright with changing |
I think EDIT: yeah, |
Given all the |
I'm not quite following why making judgements on the types of the Practically, the assertion is a testing utility which we use to make sure our functions are returning what we want them to return. The motivation for the current behaviour is in accordance with this purpose: trying to ensure that we return scalars where we are trying to. How we select when we are trying to return scalars and when we are trying to return arrays is a different question: a sensible default does seem to be the type of the Of course, one can argue that we should always be returning arrays (which I agree with in principle), but the awkward situation we have with NumPy (as #21026 (comment) shows) means that this should be a deliberate decision that ideally we can make consistent project-wide. To be honest, if we are going to move to always returning arrays, it would really be nice for NumPy to make the first move. Regarding properties of the assertions, such as self-identity and commutativity, I can see why these are intuitive properties of a function that helps us check that Given that there are places in SciPy where we want to return 0D arrays, and places where we want to return scalars (at least for now), I think that we very much do want options to enforce that What would make our lives really easy is if there was a decision that across SciPy we always wanted to return either scalars or 0D arrays, but I'm not sure we're going to get that soon. I think it would be useful to survey in more detail which parts of the codebase are supposed to return arrays vs which parts are supposed to return scalars. Then we can make a more informed decision on a sensible default. I think the risk of accidentally returning a scalar when we are supposed to return an array is quite low, but we should certainly be able to configure the assertion to catch either way. @h-vetinari I think it would be useful for you to pursue your PR, giving the assertions the semantics you have in mind, so we can see what is needed to get CI passing. Then we can compare that with alternative solutions, such as maintaining the current default but adding alternative configurations and improving documentation and error messages. |
The main objection is about If there's a stricter test needed which is perfectly fine a new named one or keyword is fine. I still don't get why array api does not provide this instead. |
I wouldn't be opposed to a new name! I suppose we didn't see a need to deviate from the standard names, but there's nothing stopping us from doing so in principle. |
I thought there was no point in running CI until one of us transitions to using the xp-assertions |
Ok, fair enough. Feel free to give this a shot, I'm low on time at the moment, so no promises until when I can get to it. Realistically all that should be necessary is removing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @h-vetinari, this looks like a great clean-up now if CI is happy!
def _is_subdtype(dtype, dtypes): | ||
""" | ||
Shorthand for calculating whether dtype is subtype of some dtypes. | ||
|
||
Also allows specifying a list instead of just a single dtype. | ||
|
||
Additionaly, the most important supertypes from | ||
https://numpy.org/doc/stable/reference/arrays.scalars.html | ||
can optionally be specified using abbreviations as follows: | ||
"i": np.integer | ||
"f": np.floating | ||
"c": np.complexfloating | ||
"n": np.number (contains the other three) | ||
""" | ||
dtypes = dtypes if isinstance(dtypes, list) else [dtypes] | ||
# map single character abbreviations, if they are in dtypes | ||
mapping = { | ||
"i": np.integer, | ||
"f": np.floating, | ||
"c": np.complexfloating, | ||
"n": np.number | ||
} | ||
dtypes = [mapping.get(x, x) for x in dtypes] | ||
return any(np.issubdtype(dtype, dt) for dt in dtypes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good but we'll be able to replace it with just xp.isdtype
when array API standard support is added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to check out #19988 to see what I'm doing with this function in the end. :)
Not sure if that's part of the design for xp.isdtype
or whether it should be, but it's certainly a useful thing to have, rather than other extremely verbose variants of the same thing.
(try writing out _is_subdtype(type(n), ["i", "f", "c"])
"by hand" to see what I mean)
There was a problem hiding this comment.
yep, in the standard that would be spelled xp.isdtype(n.dtype, 'numeric')
https://data-apis.org/array-api/draft/API_specification/generated/array_api.isdtype.html.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad example on my part then. Consider:
if _is_subdtype(type(n), ["f", "c"]):
if _is_subdtype(type(n), ["i", "f", "c", type(None)]):
or
# factorial allows floats also for extend="zero"
types_requiring_complex = "c" if fname == "factorial" else ["f", "c"]
if _is_subdtype(type(n), types_requiring_complex):
What I'm opposed to is writing out the full names like "real floating"
all the time. For the kind of code that's necessary for the factorial functions, this leads to an unreadable mess. I think single letters are clear (enough) in context and the gained improvement in legibility is so substantial that I consider it a necessity.
However, that helper doesn't have to leak beyond special/_basic.py
and I'm also perfectly happy to have _is_subdtype
dispatch to xp.isdtype
under the hood.
test failures for int dtypes on 32-bit and Windows |
For windows it looks like we need to increase some tolerances, for 32bit there's a wrong expectation of int being 64bit somewhere. Shouldn't be too hard to fix |
This prepares for an upcoming change; without this helper, supporting complex inputs becomes essentially unreadable.
avoid overloading already-complicated test_factorialk_scalar_corner_cases even further (and prepares for upcoming changes that expand it)
No functional change; reduce diff for upcoming work
having 1.1 cast to integer doesn't extend the test coverage in any way
…k} tests default for rtol is only 1e-7, see https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_allclose.html
[skip cirrus]
2c25d19
to
97f8419
Compare
@lucascolley, I think this is getting ready now. I hope I've now caught all the tolerance violations; there were two other minor failures that I'm quite confident are unrelated to this PR. |
97f8419
to
2adcf27
Compare
@lucascolley can I take your existing approval, or do you want to give this another look-over? |
thanks again! |
Preparation for complex support in factorial functions (see #18409), which needs some preparations in the tests, as well as an auxiliary function
_is_subdtype
to be able to sanely express the necessary subtype conditions.No functional changes in this PR, aside from changing the text of some warning messages. Also tightens some tests that were using the woefully inadequate default of
rtol=1e-7
inassert_allclose
Preview of what this PR is preparing for can be found in #19988