8000 BUG: Fix concatenation when the output is "S" or "U" by seberg · Pull Request #18052 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: Fix concatenation when the output is "S" or "U" #18052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 23, 2020

Conversation

seberg
Copy link
Member
@seberg seberg commented Dec 21, 2020

Previously, the dtype was used, this now assumes that we want to
cast to a string of (unknown) length. This is a simplified version
of what happens in np.array() or arr.astype() (it does never
inspect the values, e.g. for object casts).

This is more complex as I would like, and with the refactor of
ResultType and similar can be cleaned up a bit more hopefully.

Note that currently, object to "S" or "U" casts simply return
length 64 strings, but with the new version, this will be an error
(although the error message probably needs improvement).
This is a behaviour inherited from other places however.


The issue here is that NumPy 1.20 is just broken if you pass dtype="U" in concatenate. Unfortunately, this is about as "minimal" as I could think of (some things around it should be cleaned up also in the new code paths). Things are simply fairly complicated if you have "flexible dtypes" or DType (type/class) in my way of thinking about it...

There is of course more inanity here, e.g. concatenate uses result_type which uses value-based casting for 0-D objects, but there is not much to do about it right at this instance.

@charris
Copy link
Member
charris commented Dec 21, 2020

Is this a regression in 1.20.0?

@seberg
Copy link
Member Author
seberg commented Dec 21, 2020

Its a bug in a new feature in 1.20. If this feels like to much, we could pull that out in principle (although we should do this anyway probably).

The reason for the dtype argument, was because I noted that np.concatenate((["string"], [0])) would be the one weird feature I could think of that people might actually use which would go away if we deprecate number to string promotion. Without this fix though, you can't write dtype="S" to get the old result (without worrying about the string length), and you actually get a bad result.

So yeah, from my perspective backporting is slightly easier, but as the deprecation will only be in 1.21, it probably doesn't matter.

@charris
Copy link
Member
charris commented Dec 21, 2020

but as the deprecation will only be in 1.21

What deprecation? Number to string? What I'd like to know is if 1.20 will break current usage without a fix.

@seberg
Copy link
Member Author
seberg commented Dec 21, 2020

@charris sorry. Current usage is unchanged since 1.20 added the keyword argument (and this change does not modify that). If you use that new feature you can run into bugs (possibly including crashes).

@seberg seberg force-pushed the concat-with-string-dtype branch from ef084dc to cc99c61 Compare December 22, 2020 00:01
Previously, the dtype was used, this now assumes that we want to
cast to a string of (unknown) length.  This is a simplified version
of what happens in `np.array()` or `arr.astype()` (it does never
inspect the values, e.g. for object casts).

This is more complex as I would like, and with the refactor of
ResultType and similar can be cleaned up a bit more hopefully.

Note that currently, object to "S" or "U" casts simply return
length 64 strings, but with the new version, this will be an error
(although the error message probably needs improvement).
This is a behaviour inherited from other places however.
@seberg seberg force-pushed the concat-with-string-dtype branch from cc99c61 to 50dce51 Compare December 22, 2020 00:03
NULL);
ret = (PyArrayObject *)PyArray_NewFromDescr_int(
subtype, descr, 1, &shape, &stride, NULL, 0, NULL,
NULL, 0, 1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually changes nothing, but eventually, we may want to not do weird things about dtype="S0" and then it will be necessary.

@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Dec 23, 2020
@charris charris added this to the 1.20.0 release milestone Dec 23, 2020
@charris charris merged commit cd50d88 into numpy:master Dec 23, 2020
@charris
Copy link
Member
charris commented Dec 23, 2020

Let's give it a shot. Thanks Sebastian.

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Dec 23, 2020
@seberg seberg deleted the concat-with-string-dtype branch December 23, 2020 22:14
@seberg
Copy link
Member Author
seberg commented Dec 23, 2020

Thanks, sorry for missing this originally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0