-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
min/max of empty arrays #5032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
gh-2670 is a very different thing. There you calculate the min/max over 2 elements, you just happen to do it 0 times. Here you want to calculate the min/max over 0 elements and that is not well defined. What is |
I will close this for now, but please feel free to ask/post more and convince me otherwise :). |
OK, I see your point, however can't see why it behaves differently for different empty arrays (e.g. getting the minimum for
|
As noted above, the difference is that axis=1 computes minimum over two elements, whereas axis=0 computes it over zero elements. Only the latter has to invent numbers with undefined values as the results. In principle, |
Maybe to make the difference more clear, compare it with
In one case, the result is empty itself, in the other case it is not. What happens if you got Certainly not raising an error but returning something like NaN or Inf might be a reasonable choice, but I think it would be a rather largish change and changing it would have to follow a discussion the mailing list. |
-10 on returning Nan (that's just wrong!), -1 on Inf (what do you do and anyway all of the reduction operations behave like this, e.g.:
|
It seems to me that only thing you could really return is an empty array. |
I believe it'd be perfectly fine to provide an option to manually specify meaningful identity in a min/max/reduce call. |
@RerumNovarum : Before anything like that can be implemented, #8955 needs to be merged |
Repeating from #8955: I would suggest adding identities to I agree this probably doesn't match Python conventions but does make sense mathematically. Not to mention, I agree it depends on how you define it, but this is probably the most widely accepted view. |
Your argument for signed
It's worth noting that core python has decided that |
I guess you could cast and then return whatever is the maximum of
I would still argue that the identity value should be Edit: Or maybe |
I think this would be too surprising to be useful. Summarizing the problem:
The only way this can hold when N == 0, and still satisfy the property of being an identity, is if To quote the zen of python
So I'm also -1 on a default for empty arrays. I'm +1 on allowing That patch would look something like adding an |
I'll look into it. I'm not too deep into the annals of core Numpy, if you could point me towards relevant files that'd be a big help. As long as they're Cython/Pure C++ I should be able to do it, but if they're Python API C++... I'm not entirely sure. Edit: I'm talking about implementing |
The function I mention is defined here numpy/numpy/core/src/umath/ufunc_object.c Lines 2833 to 2898 in e6956dd
And called here: numpy/numpy/core/src/umath/ufunc_object.c Lines 3632 to 3914 in e6956dd
You'd need to:
|
Why leave it at |
|
Can you reopen the issue? |
why reopen? Having an initial seems like the only thing to do here for sure? |
Ooops misread, har :) |
FWIW, I agree with @hameerabbasi. Just as the
It ought to be the case that the
This is the mathematically correct way to treat the supremum/infimum of an empty collection. Related: jax-ml/jax#18661. |
Echoing some of my thoughts from jax-ml/jax#18661 here in case it is helpful: The problem with using the identity for min/max is that the supremum and infimum are only well-defined for bounded sets. Real numbers and integers are unbounded sets, so the supremum and infimum are not defined. While it's true that fixed-width approximations to these sets (like the sets of values representable by Contrast this with All that to say, I think the current approach used by both NumPy and Python (where min/max of empty sequences errors, unless an explicit identity is provided) is the right default behavior. |
FWIW, IEEE 754 floating point numbers model the affinely extended reals more so than the reals, per se, and -∞ and +∞ are perfectly fine members and indeed the well-defined infimum and supremum identity values for While I don't think it's necessary to define the result of |
OK, fair enough. But I do think it would be quite surprising and confusing to the majority of NumPy users if |
+1 to throwing an exception but accepting an optional We don't have to make assumptions about how exactly the end users interpret their numbers, so let's just not |
Note that the identity argument is already available as of NumPy 1.15 (but it's called >>> np.arange(0).min(initial=np.iinfo('int64').max)
9223372036854775807 |
A possible compromise / feature that occurs to me is adding .max and .min attributes / properties directly to dtypes, instead of alternating between iinfo, finfo, and a special case for bool, depending on the type. a.max(initial=a.dtype.min) From an API design POV, it seems to me that the common fields of the objects returned by iinfo and finfo (min, max, bits) ought to be accessible via the same interface. |
I have made a lot of work to allow adding such attributes to the dtype instances (not sure I like
Right, just to note that the name is of course intentional, because Besides boolean, where I guess it is fully unambigious. It seems to me that we will remain at: yes, it is plausible to use I do think both are valid choices, but it still seems like more guesswork to me than others (like Probably the reason is that within many contexts the actual minimum value may be well defined, e.g. |
I'm interested in a uniform interface for accessing the minimum/maximum value of a given dtype (JAX example). Is there an open issue discussing this? If not, I could open one. |
Not sure, #27231 is somewhat related, though. You can open a new one and even work on it if you like. (I am not sure how simple it would be to add these attributes. It might be very weedy in C, but actually, I think it may just be possible to do it in |
A similar issue was raised in #2670, however the edge case of (0, ) shape is still not covered. I can go around it with checking for non-zero length in an
if
statement, but it would be more practical ifnp.min
could handle this, too.The text was updated successfully, but these errors were encountered: