8000 __reduce__ no longer works on user defined types · Issue #17294 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

__reduce__ no longer works on user defined types #17294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brian2brian opened this issue Sep 11, 2020 · 13 comments · Fixed by #17295
Closed

__reduce__ no longer works on user defined types #17294

brian2brian opened this issue Sep 11, 2020 · 13 comments · Fixed by #17295

Comments

@brian2brian
Copy link

I'm migrating from 1.13.3 to 1.17.4 and calling __reduce__ no longer works on our user defined types. Ultimately this stops me being able to pickle these objects. I've verified this problem is still present in the latest release 1.19.2 and the latest master.

Reproducing code example:

This can be demonstrated with the rational user defined type that is provided with numpy

python3
>>> import numpy.core._rational_tests
>>> numpy.core._rational_tests.rational(1,2).__reduce__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: cannot include dtype 'r' in a buffer

On 1.13.3 this worked fine

python3
>>> import numpy.core.test_rational
>>> numpy.core.test_rational.rational(1,2).__reduce__()
(<built-in function scalar>, (dtype(rational), b'\x01\x00\x00\x00\x01\x00\x00\x00'))

Numpy/Python version information:

1.17.4 3.8.2 (default, Jul 16 2020, 14:00:26)
[GCC 9.3.0]

Analysis

Looks like this arises from migrating to the new buffer protocol in #10564 involving @vanossj and @eric-wieser This invokes a different code path that calls _buffer_format_string and this does not support user defined types. In fact, PyObject_GetBuffer is being called with PyBUF_SIMPLE so it won't even use the results from _buffer_format_string.

One possible fix is to support user defined types in buffer.c:_buffer_format_string

From

        default:
            PyErr_Format(PyExc_ValueError,
                         "cannot include dtype '%c' in a buffer",
                         descr->type);
            return -1;
        }

To

        default:
            if (PyTypeNum_ISUSERDEF(descr->type_num))
            {
                if (_append_char(str, descr->kind) < 0) return -1;
                break;
            }            
            PyErr_Format(PyExc_ValueError,
                         "cannot include dtype '%c' in a buffer",
                         descr->type);
            return -1;
        }
@charris charris added 06 - Regression 09 - Backport-Candidate PRs tagged should be backported labels Sep 11, 2020
@charris
Copy link
Member
charris commented Sep 11, 2020

Note that if we fix this, it will not be backported to NumPy earlier than 1.19.

@seberg
Copy link
Member
seberg commented Sep 11, 2020

Thanks for the report and good analysis! I think the solution has to be to not build the format string (or leave it empty), and then raise an error later if the PyBUF_FORMAT was requested.
I am not quite sure yet how to best design that, especially since we need to retain the nice error message.
Alternatively, maybe we can always set the format string to NULL and fill it in if the format was requested later. That may actually be better, hmmm.

@brian2brian
Copy link
Author

Thanks for the interest. I'm fine with it not being backported all the way to 1.17 - I'm sure I'd be able to use a more recent version.

@seberg
Copy link
Member
seberg commented Sep 11, 2020

@brian2brian I have a fix, but rational.__module__ == "builtins" in the NumPy version at least. And that doesn't make any sense, so pickling fails again, just a bit later.

What dtype are you using (I am generally interested), and does pickling its scalar type object work (i.e. it has correct __module__ and __name__?)

@seberg
Copy link
Member
seberg commented Sep 11, 2020

Ok, I guess its just that tp_name of "rational" in NumPy is not reasonable, and I will fix that. I hope it is reasonable for you.

@brian2brian
Copy link
Author

Yes, rational makes for an easy test case but it won't pickle even if reduce works.

I have some types that represent things like dates and times but with carefully chosen representation to get the range and precision we want in the minimum size. Pickling and unpickling has been working just fine up until now including __module__ and __name__

@seberg
Copy link
Member
seberg commented Sep 11, 2020

@brian2brian gh-17295 should fix this.

Would you be able to share that code/project, especially if it has a test-suit? I am mainly curious, because I am doing larger changes, and knowing what you use, as well as additional testing to make sure we have deprecations rather than regressions, is always good. (In the long-term, chances are that all of such user-dtype/scalars will have to be reworked unfortunately).

@brian2brian
Copy link
Author

I wish I could share the code, but it belongs to my employer is and quite intertwined with other parts of the codebase
Looking through our tests we do a lot of obvious things like testing whatever operators and functions we defined
Less obvious things could be:

  • Using with other data types in structured fields
  • Creating a dtype from either the class or the object np.dtype(rational) or np.dtype(rational(1,2))
  • Pickling (as we have seen here)
  • Accessing the underlying data buffer with tobytes()
  • Using np.frombuffer to reconstruct the array from a data buffer and dtype

As long as we can still do these things then I think we'd be fine with any future refactoring

@seberg
Copy link
Member
seberg commented Sep 14, 2020

Thanks, those things seem fine, but it may be prudent to keep an eye on the/test NumPy 1.20 release if you can. There are big changes internally, and we have to expect some smaller external things. Although, I hope none of that will affect you.

Three things I would be worried about (But, my guess is that most of this isn't even possible):

  1. You dynamically create instances of your dtype (so to speak), i.e. not np.dtype(rational) but my_dtype(unit="ms"). For example NumPy datetime64 has the time unit as a parameter. If you use metadata or cmetadata in such a way, things might be relying on strange implementation details.
  2. One of your dtype.char is identical to an existing NumPy dtype.
  3. You have custom universal function, that use functionality beyond PyUFunc_RegisterLoopForDescr or PyUFunc_RegisterLoopForType (or creating a new ufunc with the NumPy API).

@seberg
Copy link
Member
seberg commented Sep 15, 2020

@brian2brian one thing that would be helpful, is if there is a public compiled version to try out. Looking at code is nice, but probably not really necessary. Another thing you could do (I know its probably a much to big ask) is to set up testing against up-to-date master, which is available at: https://anaconda.org/scipy-wheels-nightly/numpy

@charris
Copy link
Member
charris commented Sep 15, 2020

Note that the "nightly" builds are actually "weekly" :)

seberg added a commit to seberg/numpy that referenced this issue Oct 22, 2020
Previously we had code that would allow exporting the buffer, but
then fail for any reasonable subclass, because such a subclass
should have its own user-dtype.
The change is, that now a subclass without its own user-dtype will
inherit the correct behaviour directly.
This allows pickling of of such user-defined scalars (with
user-defined dtype) if no FORMAT was requested in the buffer export.

The latter allows the generic pickling code to succeed.

Closes numpygh-17294
@seberg
Copy link
Member
seberg commented Nov 11, 2020

@brian2brian would it be helpful for you if this is fixed in 1.19 or is it OK if we defer it to 1.20? We can backport it, but basically we are now in a fairly late iteration of the 1.19.x branch, so if it doesn't matter for you we will likely not backport.

@brian2brian
Copy link
Author

I'm happy to leave it for 1.20. Many thanks for your help with this.

@seberg seberg removed the 09 - Backport-Candidate PRs tagged should be backported label Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
0