reduce no longer works on user defined types #17294

brian2brian · 2020-09-11T14:01:32Z

I'm migrating from 1.13.3 to 1.17.4 and calling __reduce__ no longer works on our user defined types. Ultimately this stops me being able to pickle these objects. I've verified this problem is still present in the latest release 1.19.2 and the latest master.

Reproducing code example:

This can be demonstrated with the rational user defined type that is provided with numpy

python3
>>> import numpy.core._rational_tests
>>> numpy.core._rational_tests.rational(1,2).__reduce__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: cannot include dtype 'r' in a buffer

On 1.13.3 this worked fine

python3
>>> import numpy.core.test_rational
>>> numpy.core.test_rational.rational(1,2).__reduce__()
(<built-in function scalar>, (dtype(rational), b'\x01\x00\x00\x00\x01\x00\x00\x00'))

Numpy/Python version information:

1.17.4 3.8.2 (default, Jul 16 2020, 14:00:26)
[GCC 9.3.0]

Analysis

Looks like this arises from migrating to the new buffer protocol in #10564 involving @vanossj and @eric-wieser This invokes a different code path that calls _buffer_format_string and this does not support user defined types. In fact, PyObject_GetBuffer is being called with PyBUF_SIMPLE so it won't even use the results from _buffer_format_string.

One possible fix is to support user defined types in buffer.c:_buffer_format_string

From

        default:
            PyErr_Format(PyExc_ValueError,
                         "cannot include dtype '%c' in a buffer",
                         descr->type);
            return -1;
        }

To

        default:
            if (PyTypeNum_ISUSERDEF(descr->type_num))
            {
                if (_append_char(str, descr->kind) < 0) return -1;
                break;
            }            
            PyErr_Format(PyExc_ValueError,
                         "cannot include dtype '%c' in a buffer",
                         descr->type);
            return -1;
        }

The text was updated successfully, but these errors were encountered:

charris · 2020-09-11T14:38:34Z

Note that if we fix this, it will not be backported to NumPy earlier than 1.19.

seberg · 2020-09-11T14:38:47Z

Thanks for the report and good analysis! I think the solution has to be to not build the format string (or leave it empty), and then raise an error later if the PyBUF_FORMAT was requested.
I am not quite sure yet how to best design that, especially since we need to retain the nice error message.
Alternatively, maybe we can always set the format string to NULL and fill it in if the format was requested later. That may actually be better, hmmm.

brian2brian · 2020-09-11T15:35:42Z

Thanks for the interest. I'm fine with it not being backported all the way to 1.17 - I'm sure I'd be able to use a more recent version.

seberg · 2020-09-11T16:58:10Z

@brian2brian I have a fix, but rational.__module__ == "builtins" in the NumPy version at least. And that doesn't make any sense, so pickling fails again, just a bit later.

What dtype are you using (I am generally interested), and does pickling its scalar type object work (i.e. it has correct __module__ and __name__?)

seberg · 2020-09-11T17:00:01Z

Ok, I guess its just that tp_name of "rational" in NumPy is not reasonable, and I will fix that. I hope it is reasonable for you.

brian2brian · 2020-09-11T17:18:22Z

Yes, rational makes for an easy test case but it won't pickle even if reduce works.

I have some types that represent things like dates and times but with carefully chosen representation to get the range and precision we want in the minimum size. Pickling and unpickling has been working just fine up until now including __module__ and __name__

seberg · 2020-09-11T17:31:59Z

@brian2brian gh-17295 should fix this.

Would you be able to share that code/project, especially if it has a test-suit? I am mainly curious, because I am doing larger changes, and knowing what you use, as well as additional testing to make sure we have deprecations rather than regressions, is always good. (In the long-term, chances are that all of such user-dtype/scalars will have to be reworked unfortunately).

brian2brian · 2020-09-14T09:45:38Z

I wish I could share the code, but it belongs to my employer is and quite intertwined with other parts of the codebase
Looking through our tests we do a lot of obvious things like testing whatever operators and functions we defined
Less obvious things could be:

Using with other data types in structured fields
Creating a dtype from either the class or the object np.dtype(rational) or np.dtype(rational(1,2))
Pickling (as we have seen here)
Accessing the underlying data buffer with tobytes()
Using np.frombuffer to reconstruct the array from a data buffer and dtype

As long as we can still do these things then I think we'd be fine with any future refactoring

seberg · 2020-09-14T23:56:29Z

Thanks, those things seem fine, but it may be prudent to keep an eye on the/test NumPy 1.20 release if you can. There are big changes internally, and we have to expect some smaller external things. Although, I hope none of that will affect you.

Three things I would be worried about (But, my guess is that most of this isn't even possible):

You dynamically create instances of your dtype (so to speak), i.e. not np.dtype(rational) but my_dtype(unit="ms"). For example NumPy datetime64 has the time unit as a parameter. If you use metadata or cmetadata in such a way, things might be relying on strange implementation details.
One of your dtype.char is identical to an existing NumPy dtype.
You have custom universal function, that use functionality beyond PyUFunc_RegisterLoopForDescr or PyUFunc_RegisterLoopForType (or creating a new ufunc with the NumPy API).

seberg · 2020-09-15T18:11:35Z

@brian2brian one thing that would be helpful, is if there is a public compiled version to try out. Looking at code is nice, but probably not really necessary. Another thing you could do (I know its probably a much to big ask) is to set up testing against up-to-date master, which is available at: https://anaconda.org/scipy-wheels-nightly/numpy

charris · 2020-09-15T18:32:19Z

Note that the "nightly" builds are actually "weekly" :)

Previously we had code that would allow exporting the buffer, but then fail for any reasonable subclass, because such a subclass should have its own user-dtype. The change is, that now a subclass without its own user-dtype will inherit the correct behaviour directly. This allows pickling of of such user-defined scalars (with user-defined dtype) if no FORMAT was requested in the buffer export. The latter allows the generic pickling code to succeed. Closes numpygh-17294

seberg · 2020-11-11T17:11:28Z

@brian2brian would it be helpful for you if this is fixed in 1.19 or is it OK if we defer it to 1.20? We can backport it, but basically we are now in a fairly late iteration of the 1.19.x branch, so if it doesn't matter for you we will likely not backport.

brian2brian · 2020-11-11T20:23:50Z

I'm happy to leave it for 1.20. Many thanks for your help with this.

charris added 06 - Regression 09 - Backport-Candidate PRs tagged should be backported labels Sep 11, 2020

seberg mentioned this issue Sep 11, 2020

BUG: Fix memory leak of buffer-info cache due to relaxed strides #16936

Merged

seberg mentioned this issue Sep 11, 2020

BUG,ENH: fix pickling user-scalars by allowing non-format buffer export #17295

Merged

DumbMice mentioned this issue Sep 29, 2020

ENH: add neon intrinsics to fft radix2&4 #17231

Closed

mattip closed this as completed in #17295 Nov 3, 2020

seberg removed the 09 - Backport-Candidate PRs tagged should be backported label Nov 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

reduce no longer works on user defined types #17294

reduce no longer works on user defined types #17294

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

__reduce__ no longer works on user defined types #17294

__reduce__ no longer works on user defined types #17294

Comments

Reproducing code example:

Numpy/Python version information:

Analysis

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

reduce no longer works on user defined types #17294

reduce no longer works on user defined types #17294