-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
DEP: Deprecate promotion to strings #15925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
To be a bit more clear as to what this does, the main thing is that code such as:
will all be deprecated, forcing you for example to write |
Ah, one problem is that the "Default Type Resolver" will not honor this, since it uses safe-casting and not type promotion. But... That is probably a no-issue since in practice we simply do not have string loops, it is just an example of how safe-promotion is IMO not a future-proof way to handle things. |
e8da69f
to
85178c6
Compare
85178c6
to
c39cbb0
Compare
I came back to it because I would prefer to fix this (I think it actually simplifies to have this for a new DType system), but it is a bit of a rabbit hole. It is probably convering, but there are two things to figure out:
So, to be honest, this may only make sense if we are willing to do the full blown |
@@ -81,6 +82,12 @@ _array_find_python_scalar_type(PyObject *op) | |||
* PyArray_DTypeFromObject encountered a string type, and that the recursive | |||
* search must be restarted so that string representation lengths can be | |||
* computed for all scalar types. | |||
* | |||
* DEPRECATED NumPy 19.0, 2020-03 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* DEPRECATED NumPy 19.0, 2020-03 | |
* DEPRECATED NumPy 1.19.0, 2020-03 |
I'm not 100% sold on this. A nice feature of object arrays is that users can pass a list of random things into a numpy function like The main issues I can think of with object arrays are:
Neither of which seems particularly problematic here. |
True, the question is really if We currently have an annoying try/except in our code which was giving me a bit of headache. But the fix is probably that this try/except is at the wrong place: It should be around the promotion itself only (plus around trying to convert to a sequence). Arrays containing arrays are an issue, but it is the same one as the ragged-array for the most part. So right, I would be happy to keep the try/except around the promotion (effectively going to |
Anyway, that sounds like a good idea (going to object for array coercion), and moving the try/except block, that should be doable, so I think I will explore that. |
For users who only care about Can you think of a gotcha case where concatenating to object there is likely to cause bugs or confusion? |
Right, but its also an incorrect answer, for example:
Edit: similar to concatenate result, where that part behaves different from the previous part. |
Oops, don't type while chatting, I was still at the current strings... Those of course would produce the correct/same answer in most cases, although certain functions would not work or may not work the same (e.g. |
Ok, I suppose you could argue that the object ufunc should work on an array containing the scalar object (albeit possibly very slow) and that some of them do not is simply a current shortcoming. Maybe I mainly have a bit of a dislike to the fact that The main argument is maybe that it is not necessarily a good generic fallback. It is in theory possible for two types that currently have no common dtype, to have on added later on. But I am not sure that disallowing the common type/promotion actually helps with such a transition. |
I will probably bring this up on the list some time (so expect pretty much a copy of this there), I would like to reach a consensus on the long term plan here. Independent of what Falling back to object technically should give the correct result, but possibly in a slow way and confusing way. The main danger code like:
returns an We currently explicitly disallow a fallback to object loops in ufuncs and generally do not have any in type promotion to object (e.g.
Which arguably should also fall back to object if string and int does. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I may be missing some context, Overall I agree that np.array([..]) with string and int dtypes should error unless dtype is explicitly provided. I also checked the pandas repo tests and it seems to be expecting a failure for str + np.array operation in pandas/tests/arithmetic/test_object.py
.
There are places in the test though which do things like np.array([2, "foo"]) which may fail after this change.
EDIT: It won't fail since its a deprecation but will throw dep warning.
* since it would give a spurious DeprecationWarning. | ||
*/ | ||
PyErr_Format(PyExc_TypeError, | ||
"No loop matching the specified signature and " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
making this consistent with the other ufuncloop error message may avoid changes on pandas side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, now that you said that. I actually tried to see what happens to pandas, and fixing this a bit was part of that. Although it seemed like a lot of warnings and I wasn't sure how to turn them to errors quickly in the run to actually see better whether or not they matter.
Fixing tests like np.array([2, "foo"])
do not matter, they are quick and easy to fix. But if it is relied on somewhere deeper, we may have to see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it seemed like a lot of warnings and I wasn't sure how to turn them to errors quickly in the run to actually see better whether or not they matter.
I just monkey patched your code to return -1 irrespective of deprecate call result 😅
@seberg is this meant to be part of the ufunc reworking for 1.20 or do we let it slide to a future release? |
Let it slide. Its important on its own IMO and it would have helped if we had it in 1.19, but at this point it doesn't really matter. Marking this as draft, only to remind myself. This has to be redone in the new code and with a different ugly kludge. |
@seberg Is this still needed? |
Yeah, definitely... But just started looking at this, and basically nothing from this PR is still useful, so closing. I probably have to make this a
|
This may need a bit of discussion, and unfortunately I think there was a reason why I did not open the PR before, but I do not remember what it was. So there might be a hole in PR somewhere (as of now it mainly needs deprecation tests, although the existing tests probably effectively cover most things).
The basic idea is that
np.promote_types()
and thus alsonp.result_type()
of any datatype and string should be deprecated. It is possible to cast an integer to string (safe or not is up for discussion), but they do not have a reasonable "common type" operation. It is not obvious that concatenating a string and an integer array is a string array, it could be an object array, butobject
arrays have enough quirks that I would not ever consider promoting to them by default a good idea (except when "object" is already involved).Maybe you could talk about allowing that with an
np.allow_object
singleton or kwarg, in which case when promotion returns an error, you just use object instead.It is possible that my hesitation was the fact that an
np.allow_object
singleton might be necessary for downstream compatibility.Closes gh-15897
EDIT: Ah, there is a bit of a mess here with the type resolution errors that I had not cleaned up.