8000 BUG: fix incorrect output descriptor in fancy indexing by ngoldbaum · Pull Request #27715 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: fix incorrect output descriptor in fancy indexing #27715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 13, 2024

Conversation

ngoldbaum
Copy link
Member
@ngoldbaum ngoldbaum commented Nov 7, 2024

Fixes #27710. Fixes #27737.

Updates a casting check in nditer internals to a stricter check that considers view safety.

seberg
seberg previously approved these changes Nov 7, 2024
Copy link
Member
@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks right extra_op is the result for indexing (things may be more interesting for assignments).

It's surprising why it's already used in the first branch, but not the second. If not a recent fix, maybe it's just a left-over of how the code grew, though.

I am surprised that the dtype mattered here, but probably it just is another way to trigger the full path (is to.e. a multi-dimensional advanced index would have done the same thing).

Shouldn't the test cover both short and large ones? But I agree, large are the interesting ones.

@ngoldbaum
Copy link
Member Author
ngoldbaum commented Nov 7, 2024

Yeah, this all dates back to when you originally wrote this code in 2022 according to the git history (unless that was a copy/paste). Probably just used to the idiom of assuming the input and output descriptor is identical in the indexing code.

Shouldn't the test cover both short and large ones? But I agree, large are the interesting ones.

Thanks for the suggestion, I expanded the test a bit and got another seg fault 🙃. Will push a supplementary fixup hopefully soon...

@seberg seberg self-requested a review November 7, 2024 17:03
@seberg seberg dismissed their stale review November 7, 2024 17:03

Have to rethink this...

@seberg
Copy link
Member
seberg commented Nov 7, 2024

Sorry, I think I have to go a bit deeper and think about it once more (should have read the comment closely).

I think in this branch, we may have a set-up a cast from self.dtype -> buffer.dtype where the buffer is created with the dtype of self. The actual cast (if there was one) would be a second step from the buffer to the result:

  self -> buffer(dtype=self.dtype) -> result

and we are setting up the first part only here, from self -> buffer(dtype=self.dtype).

However, result.dtype == self.dtype because this isn't assignment!

EDIT: Sorry, I bargled up the first version, edited and added comment.

@seberg
Copy link
Member
seberg commented Nov 7, 2024

On the "up-side": All of this should really only matter for StringDType.

@ngoldbaum
Copy link
Member Author

It took me a while to reduce it, but this is the segfault I hit after expanding the tests:

a = np.array(["hello", "world"], dtype="T")
rop = ['d'*25, 'e'*25]
ind = np.array([0, 1], dtype='int8')

b = np.array(rop, "T")
a[ind] = b

This dies inside array_assign_subscript.

@ngoldbaum
Copy link
Member Author

Ugh, I think this runs into what you and I were talking about here #27057 (comment), which ultimately requires updating MapIter internals.

@seberg
Copy link
Member
seberg commented Nov 8, 2024

Yeah, it had dawned to me also after the first too brief review. Should we try to figure that out together some time early next week? The issue isn't going away...

@ngoldbaum
Copy link
Member Author

Yeah if you'd be up for a pairing session in the morning my time and afternoon/evening your time I think that would be great. Monday is probably best for me but Wednesday works too.

npy_intp view_offset = NPY_MIN_INTP;
if (op[iop] != NULL && !(PyArray_SafeCast(
PyArray_DESCR(op[iop]), op_dtype[iop], &view_offset,
NPY_NO_CASTING, 1) && view_offset == 0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if we should have a clear way to spell it (up to changing the meaning of PyArray_EquivTypes to match this and changing the use where we want the meaning that allows string_dtype1 == string_dtype2 where needed).

Copy link
Member
@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, after looking at it closer today, I realized that my worries /flip-back was in part due to getting the code paths exactly mixed up :).

The original fix looks very much right to me, although I won't swear that there are not more such problems around!

If you think it's all good, this seems fine to me (including for backporting) @ngoldbaum.

We may have a bunch more of such issues around EquivTypes though!

@ngoldbaum
Copy link
Member Author

Assuming CI goes through OK I'll merge and open a followup to audit all the EquivTypes uses. If they're used to check casting safety they should be updated.

@ngoldbaum ngoldbaum merged commit da32320 into numpy:main Nov 13, 2024
67 of 69 checks passed
@ngoldbaum
Copy link
Member Author

I created #27751 to track the followups we identified before merging this.

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants
0