8000 BUG,ENH: fix pickling user-scalars by allowing non-format buffer export by seberg · Pull Request #17295 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG,ENH: fix pickling user-scalars by allowing non-format buffer export #17295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Nov 3, 2020

Conversation

seberg
Copy link
Member
@seberg seberg commented Sep 11, 2020

This should close gh-17294 by allowing to export buffers for user defined types if the buffer export does not ask for the FORMAT to be filled it.

The diff is based on top of gh-16936 (which touches the same code), so marking it as draft.

Comment on lines +580 to +583
if (a->format != NULL && b->format != NULL) {
c = strcmp(a->format, b->format);
if (c != 0) return c;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this looks risky - are you sure it shouldn't be:

Suggested change
if (a->format != NULL && b->format != NULL) {
c = strcmp(a->format, b->format);
if (c != 0) return c;
}
/* null format sorts before empty string */
c = (a->format != NULL) - (b->format != NULL);
if (c != 0) return c;
if (a->format != NULL && b->format != NULL) {
c = strcmp(a->format, b->format);
if (c != 0) return c;
}

Otherwise NULL is considered equal to arbitrary strings, and equality is not transitive

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, should be correct. if the format is NULL it seems OK to replace it with any other format. Now if the old (first) format is NULL, we will replace the NULL with the actual (second) format. If the second format is NULL, format is ignored completely, so it is fine as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An empty string would indicate that the itemsize is 0 in the exported buffer, I guess. A size change could seem problematic, but I do not think so.

In theory, changing the itemsize might be dangerous, but it is explicitly stored in the exported buffer info, so while it could be dangerous from a buffer user perspective, I do not think it is dangerous for having incorrect information inside the exported buffer information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this replacing of NULL happen?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info = _buffer_get_info(obj);
info = _buffer_get_info(obj,
(flags & PyBUF_F_CONTIGUOUS) == PyBUF_F_CONTIGUOUS,
(flags & PyBUF_FORMAT) == PyBUF_FORMAT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's looking increasingly like maybe we should just pass in the full flags argument

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, fair point, could pass in the flags.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still relevant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it and rebased. The rebase means the code changed slightly here (basically, I undid a bit of fixup that I did in the other PR to make this one easier, because I got it wrong...).

Previously we had code that would allow exporting the buffer, but
then fail for any reasonable subclass, because such a subclass
should have its own user-dtype.
The change is, that now a subclass without its own user-dtype will
inherit the correct behaviour directly.
This allows pickling of of such user-defined scalars (with
user-defined dtype) if no FORMAT was requested in the buffer export.

The latter allows the generic pickling code to succeed.

Closes numpygh-17294
This is necessary to allow pickling of the type object, which is
necessary to test pickling of the scalar (and in arrays)
This also tests pickling as a regression test, since at least
at this time it is directly related to the buffer export.
@seberg seberg marked this pull request as ready for review October 22, 2020 17:02
@seberg
Copy link
Member Author
seberg commented Oct 22, 2020

Test failure on 32bit linux looks real. Going to restart out of curiosity if it is reproducible (since I have currently no idea why it would fail).

Failure
=================================== FAILURES ===================================
_______ TestNewBufferProtocol.test_export_and_pickle_user_dtype[scalar] ________

self = <numpy.core.tests.test_multiarray.TestNewBufferProtocol object at 0xec9d1238>
obj = rational(1,2), error = <class 'TypeError'>

    @pytest.mark.parametrize(["obj", "error"], [
            pytest.param(np.array([1, 2], dtype=rational), ValueError, id="array"),
            pytest.param(rational(1, 2), TypeError, id="scalar")])
    def test_export_and_pickle_user_dtype(self, obj, error):
        # User dtypes should export successfully when FORMAT was not requested.
        with pytest.raises(error):
            _multiarray_tests.get_buffer_info(obj, ("STRIDED", "FORMAT"))
    
        _multiarray_tests.get_buffer_info(obj, ("STRIDED",))
    
        # This is currently also necessary to implement pickling:
        res = pickle.loads(pickle.dumps(obj))
>       assert_array_equal(res, obj)
E       AssertionError: 
E       Arrays are not equal
E       
E       Mismatched elements: 1 / 1 (100%)
E       Max absolute difference: rational(255,514)
E       Max relative difference: rational(255,257)
E        x: array(rational(1,257), dtype=rational)
E        y: array(rational(1,2), dtype=rational)

error      = <class 'TypeError'>
obj        = rational(1,2)
res        = rational(1,257)
self       = <numpy.core.tests.test_multiarray.TestNewBufferProtocol object at 0xec9d1238>

../../venv/lib/python3.8/site-packages/numpy/core/tests/test_multiarray.py:7159: AssertionError

@@ -168,7 +168,7 @@ scalar_value(PyObject *scalar, PyArray_Descr *descr)
* Use the alignment flag to figure out where the data begins
* after a PyObject_HEAD
*/
memloc = (npy_intp)scalar;
memloc = (uintptr_t)scalar;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this casting to signed was the problem (and I first thought it can't be because the denominator is stored as a denom - 1 here making the 2 a 1...

If the value is too large, it would be a negative which breaks the rounding (and also means that a previously aligned value looks unaligned).

@seberg
Copy link
Member Author
seberg commented Oct 27, 2020

Should we aim to backport this? It does fix an older regression.

@mattip mattip requested a review from eric-wieser November 2, 2020 20:41
@mattip
Copy link
Member
mattip commented Nov 3, 2020

Does this need any documentation changes? I don't think pickling a user-scalar is very common, but will this impact any of the serialization protocols?

@seberg
Copy link
Member Author
seberg commented Nov 3, 2020

Hmm, this fixes two things:

  1. Exporting buffers without a requested format now always works
  2. The fix about user-scalars is a regression. If it was not, I am not sure I would worry about it.

@seberg
Copy link
Member Author
seberg commented Nov 3, 2020

Ah, I guess its important to note that allowing buffers to always be exported only affects datetime and user-dtypes, so is a pretty niche thing, could add a release notes anyway...

@mattip mattip merged commit d62b0ee into numpy:master Nov 3, 2020
@mattip
Copy link
Member
mattip commented Nov 3, 2020

I think this is niche enough to not have a release note. Thanks @seberg

@seberg seberg deleted the issue-17294 branch November 3, 2020 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

__reduce__ no longer works on user defined types
4 participants
0