-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: Ensure that output of np.clip has the same dtype as the main array #24976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHu 8000 b”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is also what we decided for the array API https://data-apis.org/array-api/latest/API_specification/generated/array_api.clip.html#clip But I'm a little unclear what the ideal behavior should when the min or max has a higher range than the input. Consider >>> np.clip(np.asarray(0, dtype=np.int8), np.asarray(128, dtype=np.int16), None)
128 the result is a (promoted) int16. If we downcast the result back to int8, we get >>> np.clip(np.asarray(0, dtype=np.int8), np.asarray(128, dtype=np.int16), None).astype(np.int8)
-128 This is also what happens with the suggested >>> a, min = np.asarray(0, dtype=np.int8), np.asarray(128, dtype=np.int16)
>>> out = a.copy()
>>> out[a<min] = min
>>> out
array(-128, dtype=int8) Should this be considered the correct answer? It seems to me another possibility would be for For floats, downcasting just overflows to +/- inf, which is probably what you would want. Or should we reconsider the decision in the array API, and the suggestion here, to make |
@asmeurer - ah, never thought about the case when the minimum is larger than the largest value one can express... It might not be crazy to just error on that case... Though perhaps we are overthinking it, and should try to fix just the python min/max weak promotion case. For that case, raising an error would be consistent with setting analogy, since that should eventually error too, at least according to the deprecation warning that is currently raised when setting an array element with an out-of-bound integer:
Maybe it is OK to use regular ufunc promotion when |
Interesting. I don't see that deprecation warning when I run I agree if that sort of thing already deprecated in other places then it makes sense to disallow it here too. |
Oh I see, that warning (now actually an error in NumPy 2.0) comes from setting an array with an out-of-bounds Python |
Yes, indeed, it is just python integers that are treated differently, and I wondered if the easiest solution would be to extend that to p.s. Note that this is different from what I suggested on top! |
…the bounds of x As discussed in today's consortium meeting. See the discussion at numpy/numpy#24976.
FWIW, we decided to make this behavior unspecified in the standard. data-apis/array-api#814 |
I just want to add a voice to this thread that
I think most users in this scenario would expect to be able to use arrays of any type and Python ints and just have it work — there is nothing unsatisfiable about this expression, including keeping the input dtype. When you start dealing with numpy scalars and so on, the story is indeed more complicated as noted in the above discussion, but the Python int scenario seems like an easy fix (as noted by @mhvk above) that would unlock a lot of uses already. Sorry about the noise, I expect there will be lots in this repo this coming week. 😅 Thank you all! 🙏 |
Indeed, a fix at least for python ints like for comparisons seems the way forward. |
With the non-promoting behavior one should also be aware that you might not actually have >>> np.clip(np.asarray([0.0], dtype=np.float32), np.asarray([4.0311033624323596e-209], dtype=np.float64), np.asarray([1.0], dtype=np.float64))
array([4.03110336e-209])
>>> _.astype(np.float32)
array([0.], dtype=float32) That's underflow, but the rounding could work against you in virtually any case >>> x = np.asarray([1.0], dtype=np.float32)
>>> min = np.asarray([1.00000001], dtype=np.float64)
>>> max = np.asarray([2.0], dtype=np.float64)
>>> np.clip(x, min, max).astype(np.float32)
array([1.], dtype=float32)
>>> min <= _
array([False]) Tbh, I'm starting to think this whole proposal is a bad idea and putting it in the standard was a mistake. Not type promoting implies downcasting, which just leads too many weird behaviors. But if we decide to keep it in the array API and change NumPy, we should figure out reasonable behavior for ints and document the float behavior. |
I don't find those examples that bad tbh. The test is not whether they are smaller than min, it is whether they are smaller than the clipped-and-rounded min. Just like I shouldn't be surprised that |
Well note that you don't get this issue at all (rounding or integer overflow) if you just do type promotion. The original argument here is that type promotion on x is surprising, but for me this shows why it is necessary. The whole point of type promotion in general is that functions can produce a result that fits within the bounds of both input dtypes. |
I think I've gotten convinced about that too - I think the actionable part of this really just is to treat python ints specially (as we do for comparisons). |
Another corner case I just discovered: >>> np.clip(np.asarray(0, dtype=np.uint64), np.asarray(0), None)
np.float64(0.0) |
FWIW, in data-apis/array-api#814 (comment) I proposed the following semantics for
I think this definition is general enough to account for most (all?) corner cases of inputs. It is implementable for built-in types by first “snapping” the bounds to a representable value in T “in the right direction” and then doing the obvious min(max(…)) thing. For custom numeric types such as Decimal or Fraction, could just throw an error. |
Thanks @fancidev! That matches my intuition as well. Indeed, when |
Yes I think so.
The behavior in that case could be tricky because numpy supports broadcasting of the arguments. What if some bounds are invalid and some are valid? Should it raise an error so that no result is produced at all? Or should it return I don’t see an obvious answer, but maybe raising an error would be the most prudent. |
clip has the following note:
b < a is a special-case of T ∩ [a, b] being empty. |
Indeed, and that’s a useful hint. The C++17 standard for std::clamp contains a similar note:
A subtle difference is that in C++, v, lo, hi have the same type, so the above remark is exhaustive for undefined behavior. This is also the case as long as numpy performs type promotion. If numpy does not perform type promotion, then the remark becomes non-exhaustive (and thus a “special case”). |
…s outside the bounds of `x` PR-URL: #814 Ref: numpy/numpy#24976 Co-authored-by: Athan Reines <kgryte@gmail.com> Reviewed-by: Athan Reines <kgryte@gmail.com>
Proposed new feature or change:
(Discussed in #23912 (comment))
The function
np.clip
arguably has surprising casting behaviour:I would naively have expected for the output dtype to always be the same as the input one. That this does not happen is because internally
np.clip
calls a ufunc:numpy/numpy/_core/_methods.py
Lines 92 to 101 in d885b0b
and these treat the arguments symmetrically.
It is possible to get the output dtype by setting
out
ordtype
, but in the current implementation that still gives either theOverflowError
or casting errors:adding
casting="unsafe"
gives the wrong answer, because-1
becomes255
.I think it should be possible to make the
np.clip
function (probably not theufunc
) cast themin
andmax
toa.dtype
, but ensure that the valid ranges are respected (i.e., negative integers would become 0 if the dtype is unsigned). This would be similar to what was done in #24915, i.e., ideally we have the behaviour ofnp.clip
be identical to(which still gives an out-of-bound error for
min=-1
because of__setitem__
, but works formin=np.int64(-1)
)But perhaps this is more work than is warranted.
The text was updated successfully, but these errors were encountered: