-
Notifications
You must be signed in to change notification settings - Fork 226
Use new buffer interface to unpack #195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Other than possibly allowing |
Why? |
For 3.x, that is correct, as described here: https://docs.python.org/3/c-api/objbuffer.html As always 2.7 is the culprit. Since both protocols exist there, objects have to support them separately. Ironically,
Besides technical reasons, since these functions are deprecated since 3.0, we should IMO refrain from using them if possible. The only relevant builtin object that I know does not support the new protocol is |
@jfolz Thanks. How do you think dropping old buffer protocol? |
|
@methane Simple code is good code :) To sum it up:
Very few people will ever see a warning, but backward compatibility is kept in 0.5. Even less people will have to change their custom types that export the old protocol for 1.0. |
@@ -145,6 +147,9 @@ def unpackb(object packed, object object_hook=None, object list_hook=None, | |||
else: | |||
raise | |||
else: | |||
warnings.warn("unpacking %s requires old buffer protocol," |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this warning be issued after the attempt to get the buffer? If the object does not support any protocol (old or new), it will still result in a TypeError
, but the message may be a bit confusing.
a47f0f7
to
4b8bd7f
Compare
@@ -259,7 +259,8 @@ def feed(self, next_bytes): | |||
DeprecationWarning) | |||
else: | |||
raise | |||
assert view.itemsize == 1 | |||
if view.itemsize != 1: | |||
raise ValueError("cannot unpack from multi-byte object") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to raise a proper exception with explanation here. Multi-byte types are very problematic for fallback code, since there is not way to "cast" memoryview
to a different type.
Usually I'm all for developer responsibility, but honestly, if you're trying to unpack floats, I'm pretty sure you're doing it wrong.
These last few commits improve the consistency between Cython and fallback. Something that's still inconsistent is the behavior of Fallback attempts to use the old buffer protocol if possible/necessary and does not accept multi-byte types. Cython code only uses the new protocol and accepts multi-byte types. Should Cython be aligned with fallback here? |
else: | ||
PyObject_AsReadBuffer(packed, <const void**>&buf, &buf_len) | ||
PyErr_WarnEx(DeprecationWarning, | ||
"unpacking %s requires old buffer protocol," |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a blank at the end of first string.
Last commit moves buffer access in Cython to a separate function to simplify code as well as guarantee unpack/unpackb/feed react identical to data. |
@@ -20,6 +19,7 @@ from cpython.exc cimport PyErr_WarnEx | |||
cdef extern from "Python.h": | |||
ctypedef struct PyObject | |||
cdef int PyObject_AsReadBuffer(object o, const void** buff, Py_ssize_t* buf_len) except -1 | |||
cdef int PyObject_GetBuffer(object o, Py_buffer *view, int flags) except -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you move this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exceptions were ignored (e.g., if array is not C-contiguous), which led to access violations.
Should I move related functions as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exceptions were ignored
Why?
https://github.com/cython/cython/blob/970c2fc0e676ca22016e14147ada0edba937dc6b/Cython/Includes/cpython/buffer.pxd#L31
Cython's declaration has except -1
. I don't know why exception were ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I reverted this change and it had no effect.
At first I could not reproduce the problem, but I think the control flow was wrong, because Cython does not automatically exit a cdef
function when exceptions occur. Checking for errors and explicitly re-raising does the job.
b0fdcec
to
4d40e65
Compare
These commits should make memory handling more solid. The following kinds of data are rejected:
What I'm not satisfied with:
Overall, a lot of added/moved code. This was a lot less finished than I originally thought, really sorry about that. |
Thanks for your job. I'll review in this week hopefully. But my first impression is... I forget why we need such a big code. I'm afraid about we're over engineering. |
FYI, This is the code in getargs.c in cpython: static int
getbuffer(PyObject *arg, Py_buffer *view, const char **errmsg)
{
if (PyObject_GetBuffer(arg, view, PyBUF_SIMPLE) != 0) {
*errmsg = "bytes-like object";
return -1;
}
if (!PyBuffer_IsContiguous(view, 'C')) {
PyBuffer_Release(view);
*errmsg = "contiguous buffer";
return -1;
}
return 0;
} |
I agree that the buffer code is more complex than it could be. On contiguity checks: Sadly there is no simple way to check for it in 2.7/PyPy. I could add a pure Python port of the C code that does these checks, though that would be a lot of code: https://github.com/python/cpython/blob/master/Objects/abstract.c |
I think there are no reasy about rejecting data which |
Technically it is enough, but my main goal with this was to keep consistency between different runtimes. Idea: Fallback should be fine with sliced data. For Cython where we need contiguous data, we can try to get it using |
One important purpose of fallback module is supporting PyPy. |
self._fb_buf_n += len(next_bytes) | ||
self._fb_buffers.append(next_bytes) | ||
self._fb_buf_n += L | ||
self._fb_buffers.append(view) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do copy here.
Supporting memoryview makes it possible to avoid memory copy on user side, not here.
>>> import io
>>> f = io.BytesIO(b"foobar")
>>> buf = bytearray(4)
>>> n = f.readinto(buf)
>>> n
4
>>> buf
bytearray(b'foob')
>>> a = [memoryview(buf)[:n]]
>>> n = f.readinto(buf)
>>> n
2
>>> buf
bytearray(b'arob')
>>> a += [memoryview(buf)[:n]]
>>> a
[<memory at 0x10ee60288>, <memory at 0x10ee60348>]
>>> bytes(a[0])
b'arob' # Overwritten!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More realistic sample code of .feed()
usage with memoryview.
unpacker = msgpack.Unpacker()
buf = bytearray(16*1024)
while True:
n = f.readinto(buf)
if n is None:
break
unpacker.feed(memoryview(buf)[:n]) # more efficient than buf[:n]
for obj in unpacker:
callback(obj)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we will be copying anyway it would make sense to use a single, larger buffer. I experimented in that direction, but it was quite complex.
Good point. We should suspend this PR until I can do that. |
Using PyPy3 2.4.0. PyPI (v0.4.7)
This PR (+copy in feed)
This PR (revert to bytes as buffers)
Seems like PyPy is well-optimized for bytes, not so much for memoryviews. |
PyObject_GetBuffer(contiguous, &tmp, PyBUF_SIMPLE) | ||
PyBuffer_Release(view) | ||
PyBuffer_FillInfo(view, contiguous, tmp.buf, tmp.len, 1, 0) | ||
Py_DECREF(contiguous) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this does:
PyMemoryView_GetContiguous
creates amemoryview
object to a contiguous copy of the non-contiguous memory- Get a
Py_buffer
for this contiguous memory and transfer the information to the target buffer - Release resources as necessary
e573df7
to
31f76fb
Compare
Maybe related issue: http://bugs.python.org/issue20699 |
2.7 is a mess with regards to buffer handling. Working with this stuff makes me want to move to 3.5. |
I briefly tested Cython's typed memoryviews. Sadly, they require write access, so they don't work for most bytes-like objects. |
@methane Any updates on this? |
@@ -150,6 +195,8 @@ def unpackb(object packed, object object_hook=None, object list_hook=None, | |||
use_list, cenc, cerr, | |||
max_str_len, max_bin_len, max_array_len, max_map_len, max_ext_len) | |||
ret = unpack_construct(&ctx, buf, buf_len, &off) | |||
if new_protocol: | |||
PyBuffer_Release(&view); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try ~ finally is required, since unpack_construct, encoding.encode() and unicode_errors.encode() may raise Exception.
LGTM, except one comment. |
I added try-finally and squashed. Thanks for the help, looking forward to 0.5 :) |
Thanks |
This PR adds support for unpacking/feeding from any object that supports the new buffer interface.
For compatibility, an attempt is made to use the old buffer interface if that fails. On success, a
RuntimeWarning
is issued to inform users of possible errors and future removal of this feature.