ENH: Ensure that output of np.clip has the same dtype as the main array #24976

mhvk · 2023-10-21T15:50:34Z

Proposed new feature or change:

The function np.clip arguably has surprising casting behaviour:

a = np.arange(5, dtype='u1')
np.clip(a, -1, 3)
# OverflowError with NEP 50
np.clip(a, np.int64(-1), np.int64(3))
# array([0, 1, 2, 3, 3])  
# int64 dtype with NEP 50
# (before NEP 50, both examples gave int16)

I would naively have expected for the output dtype to always be the same as the input one. That this does not happen is because internally np.clip calls a ufunc:

numpy/numpy/_core/_methods.py

Lines 92 to 101 in d885b0b

    
           def _clip(a, min=None, max=None, out=None, **kwargs): 
        
               if min is None and max is None: 
        
                   raise ValueError("One of max or min must be given") 
        
               if min is None: 
        
                   return um.minimum(a, max, out=out, **kwargs) 
        
               elif max is None: 
        
                   return um.maximum(a, min, out=out, **kwargs) 
        
               else: 
        
                   return um.clip(a, min, max, out=out, **kwargs)

and these treat the arguments symmetrically.

It is possible to get the output dtype by setting out or dtype, but in the current implementation that still gives either the OverflowError or casting errors:

np.clip(a, np.int64(-1), np.int64(3), out=a)  # or dtype=a.dtype
# UFuncTypeError: Cannot cast ufunc 'clip' output from dtype('int64') to dtype('uint8') with casting rule 'same_kind'

adding casting="unsafe" gives the wrong answer, because -1 becomes 255.

I think it should be possible to make the np.clip function (probably not the ufunc) cast the min and max to a.dtype, but ensure that the valid ranges are respected (i.e., negative integers would become 0 if the dtype is unsigned). This would be similar to what was done in #24915, i.e., ideally we have the behaviour of np.clip be identical to

min = -1
max = 3
out = a.copy()
out[a<min] = min
out[a>max] = max
return out

(which still gives an out-of-bound error for min=-1 because of __setitem__, but works for min=np.int64(-1))

But perhaps this is more work than is warranted.

The text was updated successfully, but these errors were encountered:

asmeurer · 2024-06-07T22:56:29Z

This is also what we decided for the array API https://data-apis.org/array-api/latest/API_specification/generated/array_api.clip.html#clip

But I'm a little unclear what the ideal behavior should when the min or max has a higher range than the input. Consider

>>> np.clip(np.asarray(0, dtype=np.int8), np.asarray(128, dtype=np.int16), None)
128

the result is a (promoted) int16. If we downcast the result back to int8, we get -128:

>>> np.clip(np.asarray(0, dtype=np.int8), np.asarray(128, dtype=np.int16), None).astype(np.int8)
-128

This is also what happens with the suggested out[a<min] = min behavior:

>>> a, min = np.asarray(0, dtype=np.int8), np.asarray(128, dtype=np.int16)
>>> out = a.copy()
>>> out[a<min] = min
>>> out
array(-128, dtype=int8)

Should this be considered the correct answer? It seems to me another possibility would be for clip to avoid wrapping and instead "clip", as it were, large values to iinfo(a.dtype).max (i.e., the above would return 127)?

For floats, downcasting just overflows to +/- inf, which is probably what you would want.

Or should we reconsider the decision in the array API, and the suggestion here, to make clip not perform type promotion?

mhvk · 2024-06-08T21:11:13Z

@asmeurer - ah, never thought about the case when the minimum is larger than the largest value one can express... It might not be crazy to just error on that case...

Though perhaps we are overthinking it, and should try to fix just the python min/max weak promotion case. For that case, raising an error would be consistent with setting analogy, since that should eventually error too, at least according to the deprecation warning that is currently raised when setting an array element with an out-of-bound integer:

DeprecationWarning: NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays.  The conversion of 256 to uint8 will fail in the future.

Maybe it is OK to use regular ufunc promotion when min, max have different dtype (or at least we can punt...).

asmeurer · 2024-06-11T22:31:21Z

Interesting. I don't see that deprecation warning when I run out[a<min] = min. Is it supposed to show there?

I agree if that sort of thing already deprecated in other places then it makes sense to disallow it here too.

asmeurer · 2024-06-11T22:34:46Z

Oh I see, that warning (now actually an error in NumPy 2.0) comes from setting an array with an out-of-bounds Python int. That's a very different thing that downcasting an array with a higher precision integer dtype. I expect that sort of thing is done all the time and isn't something NumPy would want to deprecate.

mhvk · 2024-06-12T06:03:08Z

Yes, indeed, it is just python integers that are treated differently, and I wondered if the easiest solution would be to extend that to np.clip. It seems fairly reasonable to just change dtype if one passes in multiple arrays -- e.g., np.clip(input_u1, min_i2, max_i2) -- and then by analogy we should do the same thing for numpy scalars.

p.s. Note that this is different from what I suggested on top!

…the bounds of x As discussed in today's consortium meeting. See the discussion at numpy/numpy#24976.

asmeurer · 2024-06-13T22:30:39Z

FWIW, we decided to make this behavior unspecified in the standard. data-apis/array-api#814

jni · 2024-06-18T04:26:06Z

I just want to add a voice to this thread that np.clip as currently implemented (numpy 2.0.0) is quite hard to use:

>>> np.clip(np.array([0, 255], dtype=np.uint8), 0, 4550)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jni/micromamba/envs/np2/lib/python3.12/site-packages/numpy/_core/fromnumeric.py", line 2247, in clip
    return _wrapfunc(a, 'clip', a_min, a_max, out=out, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jni/micromamba/envs/np2/lib/python3.12/site-packages/numpy/_core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/jni/micromamba/envs/np2/lib/python3.12/site-packages/numpy/_core/_methods.py", line 108, in _clip
    return um.clip(a, min, max, out=out, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OverflowError: Python integer 4550 out of bounds for uint8

I think most users in this scenario would expect to be able to use arrays of any type and Python ints and just have it work — there is nothing unsatisfiable about this expression, including keeping the input dtype.

When you start dealing with numpy scalars and so on, the story is indeed more complicated as noted in the above discussion, but the Python int scenario seems like an easy fix (as noted by @mhvk above) that would unlock a lot of uses already.

Sorry about the noise, I expect there will be lots in this repo this coming week. 😅 Thank you all! 🙏

mhvk · 2024-06-18T06:26:30Z

Indeed, a fix at least for python ints like for comparisons seems the way forward.

asmeurer · 2024-06-27T20:16:29Z

With the non-promoting behavior one should also be aware that you might not actually have min <= x <= max when min or max are float64 and x is float32, because of rounding in the downcast:

>>> np.clip(np.asarray([0.0], dtype=np.float32), np.asarray([4.0311033624323596e-209], dtype=np.float64), np.asarray([1.0], dtype=np.float64))
array([4.03110336e-209])
>>> _.astype(np.float32)
array([0.], dtype=float32)

That's underflow, but the rounding could work against you in virtually any case

>>> x = np.asarray([1.0], dtype=np.float32)
>>> min = np.asarray([1.00000001], dtype=np.float64)
>>> max = np.asarray([2.0], dtype=np.float64)
>>> np.clip(x, min, max).astype(np.float32)
array([1.], dtype=float32)
>>> min <= _
array([False])

Tbh, I'm starting to think this whole proposal is a bad idea and putting it in the standard was a mistake. Not type promoting implies downcasting, which just leads too many weird behaviors. But if we decide to keep it in the array API and change NumPy, we should figure out reasonable behavior for ints and document the float behavior.

jni · 2024-06-28T01:05:07Z

I don't find those examples that bad tbh. The test is not whether they are smaller than min, it is whether they are smaller than the clipped-and-rounded min. Just like I shouldn't be surprised that 0.1 + 0.2 <= 0.3 evaluates to False, I should not surprised by those edge cases. The only situation where I think clip should fail to do anything is when the min and max are both outside the range of the array dtype limits. ie np.clip(np.array([9], dtype=np.uint8), 400, 500) should be an error.

asmeurer · 2024-07-01T21:10:12Z

Well note that you don't get this issue at all (rounding or integer overflow) if you just do type promotion. The original argument here is that type promotion on x is surprising, but for me this shows why it is necessary. The whole point of type promotion in general is that functions can produce a result that fits within the bounds of both input dtypes.

mhvk · 2024-07-02T11:14:12Z

Well note that you don't get this issue at all (rounding or integer overflow) if you just do type promotion. The original argument here is that type promotion on x is surprising, but for me this shows why it is necessary. The whole point of type promotion in general is that functions can produce a result that fits within the bounds of both input dtypes.

I think I've gotten convinced about that too - I think the actionable part of this really just is to treat python ints specially (as we do for comparisons).

asmeurer · 2024-07-09T20:19:27Z

Another corner case I just discovered: clip promotes uint64 to float64 (I thought uint64 -> float promotion was removed in NumPy 2.0).

>>> np.clip(np.asarray(0, dtype=np.uint64), np.asarray(0), None)
np.float64(0.0)

fancidev · 2024-10-22T22:47:04Z

FWIW, in data-apis/array-api#814 (comment) I proposed the following semantics for clip:

Let T be the data type of x. If there exists a value in T that is greater than or equal to min and less than or equal to max, then clip(x, min, max) returns (1) the smallest element in T that is greater than or equal to min if x < min, (2) the largest element in T that is less than or equal to max if x > max, or (3) x otherwise.

In math notation, this becomes

$$ \mathrm{clip}(x,a,b):= {\arg\min}_{ t \in T\cap [a,b]} | t-x | $$

I think this definition is general enough to account for most (all?) corner cases of inputs. It is implementable for built-in types by first “snapping” the bounds to a representable value in T “in the right direction” and then doing the obvious min(max(…)) thing. For custom numeric types such as Decimal or Fraction, could just throw an error.

jni · 2024-10-23T06:14:15Z

Thanks @fancidev! That matches my intuition as well. Indeed, when $T \cap [a, b]$ is empty, the expression is undefined, right? ie it should raise an error, as I proposed above?

fancidev · 2024-10-23T09:37:24Z

when T ∩ [ a , b ] is empty, the expression is undefined, right?

Yes I think so.

ie it should raise an error, as I proposed above?

The behavior in that case could be tricky because numpy supports broadcasting of the arguments. What if some bounds are invalid and some are valid? Should it raise an error so that no result is produced at all? Or should it return nan (for floating point types) for those bounds that are invalid? How about integers, which don’t have a nan?

I don’t see an obvious answer, but maybe raising an error would be the most prudent.

asmeurer · 2024-10-23T19:18:50Z

clip has the following note:

    Notes
    -----
    When `a_min` is greater than `a_max`, `clip` returns an
    array in which all values are equal to `a_max`,
    as shown in the second example.

b < a is a special-case of T ∩ [a, b] being empty.

fancidev · 2024-10-23T23:19:56Z

b < a is a special-case of T ∩ [a, b] being empty.

Indeed, and that’s a useful hint. The C++17 standard for std::clamp contains a similar note:

If lo is greater than hi, the behavior is undefined.

A subtle difference is that in C++, v, lo, hi have the same type, so the above remark is exhaustive for undefined behavior. This is also the case as long as numpy performs type promotion. If numpy does not perform type promotion, then the remark becomes non-exhaustive (and thus a “special case”).

…s outside the bounds of `x` PR-URL: #814 Ref: numpy/numpy#24976 Co-authored-by: Athan Reines <kgryte@gmail.com> Reviewed-by: Athan Reines <kgryte@gmail.com>

kgryte mentioned this issue Jan 11, 2024

Add clip to the specification data-apis/array-api#715

Merged

asmeurer mentioned this issue Apr 19, 2024

2023.12 support data-apis/array-api-tests#249

Closed

21 tasks

asmeurer added a commit to asmeurer/array-api that referenced this issue Jun 13, 2024

Clarify that clip() behavior is undefined when min or max is outside …

5f2bb2e

…the bounds of x As discussed in today's consortium meeting. See the discussion at numpy/numpy#24976.

asmeurer mentioned this issue Jun 13, 2024

docs: clarify that clip behavior is undefined when min or max is outside the bounds of x data-apis/array-api#814

Merged

seberg mentioned this issue Jun 19, 2024

ENH: np.clip as currently implemented (numpy 2.0.0) is quite hard to use #26759

Closed

asmeurer mentioned this issue Jun 27, 2024

Issues with test_clip data-apis/array-api-tests#276

Closed

ndgrigorian mentioned this issue Jul 19, 2024

tensor.clip throws ValueError on input generated by array API test IntelPython/dpctl#1742

Closed

asmeurer mentioned this issue Jul 31, 2024

CI: Upgrade array-api-tests #27081

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Ensure that output of np.clip has the same dtype as the main array #24976

ENH: Ensure that output of np.clip has the same dtype as the main array #24976

ENH: Ensure that output of np.clip has the same dtype as the main array #24976

ENH: Ensure that output of np.clip has the same dtype as the main array #24976

Comments

Proposed new feature or change: