-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG,ENH: fix pickling user-scalars by allowing non-format buffer export #17295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
374c38b
to
c901a8d
Compare
if (a->format != NULL && b->format != NULL) { | ||
c = strcmp(a->format, b->format); | ||
if (c != 0) return c; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this looks risky - are you sure it shouldn't be:
if (a->format != NULL && b->format != NULL) { | |
c = strcmp(a->format, b->format); | |
if (c != 0) return c; | |
} | |
/* null format sorts before empty string */ | |
c = (a->format != NULL) - (b->format != NULL); | |
if (c != 0) return c; | |
if (a->format != NULL && b->format != NULL) { | |
c = strcmp(a->format, b->format); | |
if (c != 0) return c; | |
} |
Otherwise NULL
is considered equal to arbitrary strings, and equality is not transitive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, should be correct. if the format is NULL
it seems OK to replace it with any other format. Now if the old (first) format is NULL
, we will replace the NULL
with the actual (second) format. If the second format is NULL
, format is ignored completely, so it is fine as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An empty string would indicate that the itemsize is 0 in the exported buffer, I guess. A size change could seem problematic, but I do not think so.
In theory, changing the itemsize might be dangerous, but it is explicitly stored in the exported buffer info, so while it could be dangerous from a buffer user perspective, I do not think it is dangerous for having incorrect information inside the exported buffer information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this replacing of NULL
happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy/core/src/multiarray/buffer.c
Outdated
info = _buffer_get_info(obj); | ||
info = _buffer_get_info(obj, | ||
(flags & PyBUF_F_CONTIGUOUS) == PyBUF_F_CONTIGUOUS, | ||
(flags & PyBUF_FORMAT) == PyBUF_FORMAT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's looking increasingly like maybe we should just pass in the full flags argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, fair point, could pass in the flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it and rebased. The rebase means the code changed slightly here (basically, I undid a bit of fixup that I did in the other PR to make this one easier, because I got it wrong...).
Previously we had code that would allow exporting the buffer, but then fail for any reasonable subclass, because such a subclass should have its own user-dtype. The change is, that now a subclass without its own user-dtype will inherit the correct behaviour directly. This allows pickling of of such user-defined scalars (with user-defined dtype) if no FORMAT was requested in the buffer export. The latter allows the generic pickling code to succeed. Closes numpygh-17294
This is necessary to allow pickling of the type object, which is necessary to test pickling of the scalar (and in arrays)
This also tests pickling as a regression test, since at least at this time it is directly related to the buffer export.
Test failure on 32bit linux looks real. Going to restart out of curiosity if it is reproducible (since I have currently no idea why it would fail). Failure
|
@@ -168,7 +168,7 @@ scalar_value(PyObject *scalar, PyArray_Descr *descr) | |||
* Use the alignment flag to figure out where the data begins | |||
* after a PyObject_HEAD | |||
*/ | |||
memloc = (npy_intp)scalar; | |||
memloc = (uintptr_t)scalar; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this casting to signed was the problem (and I first thought it can't be because the denominator is stored as a denom - 1
here making the 2
a 1
...
If the value is too large, it would be a negative which breaks the rounding (and also means that a previously aligned value looks unaligned).
Should we aim to backport this? It does fix an older regression. |
Does this need any documentation changes? I don't think pickling a user-scalar is very common, but will this impact any of the serialization protocols? |
Hmm, this fixes two things:
|
Ah, I guess its important to note that allowing buffers to always be exported only affects datetime and user-dtypes, so is a pretty niche thing, could add a release notes anyway... |
I think this is niche enough to not have a release note. Thanks @seberg |
This should close gh-17294 by allowing to export buffers for user defined types if the buffer export does not ask for the FORMAT to be filled it.
The diff is based on top of gh-16936 (which touches the same code), so marking it as draft.