8000 PEP 688: Making the buffer protocol accessible in Python (#2549) · python/peps@0cb0066 · GitHub
[go: up one dir, main page]

Skip to content

Commit 0cb0066

Browse files
PEP 688: Making the buffer protocol accessible in Python (#2549)
1 parent 24dec44 commit 0cb0066

File tree

2 files changed

+276
-0
lines changed

2 files changed

+276
-0
lines changed

.github/CODEOWNERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -568,6 +568,7 @@ pep-0684.rst @ericsnowcurrently
568568
pep-0685.rst @brettcannon
569569
pep-0686.rst @methane
570570
pep-0687.rst @encukou
571+
pep-0688.rst @jellezijlstra
571572
# ...
572573
# pep-0754.txt
573574
# ...

pep-0688.rst

Lines changed: 275 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,275 @@
1+
PEP: 688
2+
Title: Making the buffer protocol accessible in Python
3+
Author: Jelle Zijlstra <jelle.zijlstra@gmail.com>
4+
Status: Draft
5+
Type: Standards Track
6+
Content-Type: text/x-rst
7+
Created: 23-Apr-2022
8+
Python-Version: 3.12
9+
10+
11+
Abstract
12+
========
13+
14+
This PEP proposes a mechanism for Python code to inspect whether a
15+
type supports the C-level buffer protocol. This allows type
16+
checkers to evaluate whether objects implement the protocol.
17+
18+
19+
Motivation
20+
==========
21+
22+
The CPython C API provides a versatile mechanism for accessing the
23+
underlying memory of an object—the buffer protocol from :pep:`3118`.
24+
Functions that accept binary data are usually written to handle any
25+
object implementing the buffer protocol. For example, at the time of writing,
26+
there are around 130 functions in CPython using the Argument Clinic
27+
``Py_buffer`` type, which accepts the buffer protocol.
28+
29+
Currently, there is no way for Python code to inspect whether an object
30+
supports the buffer protocol. Moreover, the static type system
31+
does not provide a type annotation to represent the protocol.
32+
This is a `common problem <https://github.com/python/typing/issues/593>`__
33+
when writing type annotations for code that accepts generic buffers.
34+
35+
36+
Rationale
37+
=========
38+
39+
Current options
40+
---------------
41+
42+
There are two current workarounds for annotating buffer types in
43+
the type system, but neither is adequate.
44+
45+
First, the `current workaround <https://github.com/python/typeshed/blob/2a0fc1b582ef84f7a82c0beb39fa617de2539d3d/stdlib/_typeshed/__init__.pyi#L194>`__
46+
for buffer types in typeshed is a type alias
47+
that lists well-known buffer types in the standard library, such as
48+
``bytes``, ``bytearray``, ``memoryview``, and ``array.array``. This
49+
approach works for the standard library, but it does not extend to
50+
third-party buffer types.
51+
52+
Second, the `documentation <https://docs.python.org/3.10/library/typing.html#typing.ByteString>`__
53+
for ``typing.ByteString`` currently states:
54+
55+
This type represents the types ``bytes``, ``bytearray``, and
56+
``memoryview`` of byte sequences.
57+
58+
As a shorthand for this type, ``bytes`` can be used to annotate
59+
arguments of any of the types mentioned above.
60+
61+
Although this sentence has been in the documentation
62+
`since 2015 <https://github.com/python/cpython/commit/2a19d956ab92fc9084a105cc11292cb0438b322f>`__,
63+
the use of ``bytes`` to include these other types is not specified
64+
in any of the typing PEPs. Furthermore, this mechanism has a number of
65+
problems. It does not include all possible buffer types, and it
66+
makes the ``bytes`` type ambiguous in type annotations. After all,
67+
there are many operations that are valid on ``bytes`` objects, but
68+
not on ``memoryview`` objects, and it is perfectly possible for
69+
a function to accept ``bytes`` but not ``memoryview`` objects.
70+
A mypy user
71+
`reports <https://github.com/python/mypy/issues/12643#issuecomment-1105914159>`__
72+
that this shortcut has caused significant problems for the ``psycopg`` project.
73+
74+
Kinds of buffers
75+
----------------
76+
77+
The C buffer protocol supports
78+
`many options <https://docs.python.org/3.10/c-api/buffer.html#buffer-request-types>`__,
79+
affecting strides, contiguity, and support for writing to the buffer. Some of these
80+
options would be useful in the type system. For example, typeshed
81+
currently provides separate type aliases for writable and read-only
82+
buffers.
83+
84+
However, in the C buffer protocol, these options cannot be
85+
queried directly on the type object. The only way to figure out
86+
whether an object supports a writable buffer is to actually
87+
ask for the buffer. For some types, such as ``memoryview``,
88+
whether the buffer is writable depends on the instance:
89+
some instances are read-only and others are not. As such, we propose to
90+
expose only whether a type implements the buffer protocol at
91+
all, not whether it supports more specific options such as
92+
writable buffers.
93+
94+
Specification
95+
=============
96+
97+
types.Buffer
98+
------------
99+
100+
A new class, ``types.Buffer``, will be added. It cannot be instantiated or
101+
subclassed at runtime, but supports the ``__instancecheck__`` and
102+
``__subclasscheck__`` hooks. In CPython, these will check for the presence of the
103+
``bf_getbuffer`` slot in the type object:
104+
105+
.. code-block:: pycon
106+
107+
>>> from types import Buffer
108+
>>> isinstance(b"xy", Buffer)
109+
True
110+
>>> issubclass(bytes, Buffer)
111+
True
112+
>>> issubclass(memoryview, Buffer)
113+
True
114+
>>> isinstance("xy", Buffer)
115+
False
116+
>>> issubclass(str, Buffer)
117+
False
118+
119+
The new class can also be used in type annotations:
120+
121+
.. code-block:: python
122+
123+
def need_buffer(b: Buffer) -> memoryview:
124+
return memoryview(b)
125+
126+
need_buffer(b"xy") # ok
127+
need_buffer("xy") # rejected by static type checkers
128+
129+
Usage in stub files
130+
-------------------
131+
132+
For static typing purposes, types defined in C extensions usually
133+
require stub files, as :pep:`described in PEP 484 <484#stub-files>`.
134+
In stub files, ``types.Buffer`` may be used as a base class to
135+
indicate that a class implements the buffer protocol.
136+
137+
For example, ``memoryview`` may be declared as follows in a stub:
138+
139+
.. code-block:: python
140+
141+
class memoryview(types.Buffer, Sized, Sequence[int]):
142+
...
143+
144+
The ``types.Buffer`` class does not require any special treatment
145+
by type checkers.
146+
147+
Equivalent for older Python versions
148+
------------------------------------
149+
150+
New typing features are usually backported to older Python versions
151+
in the `typing_extensions <https://pypi.org/project/typing-extensions/>`_
152+
package. Because the buffer protocol
153+
is accessible only in C, ``types.Buffer`` cannot be implemented
154+
in a pure-Python package like ``typing_extensions``. As a temporary
155+
workaround, a ``typing_extensions.Buffer``
156+
`abstract base class <Buffer ABC_>`__ will be provided for Python versions
157+
that do not have ``types.Buffer`` available.
158+
159+
For the benefit of
160+
static type checkers, ``typing_extensions.Buffer`` can be used as
161+
a base class in stubs to mark types as supporting the buffer protocol.
162+
For runtime uses, the ``ABC.register`` API can be used to register
163+
buffer classes with ``typing_extensions.Buffer``.
164+
165+
When ``types.Buffer`` is available, ``typing_extensions`` should simply
166+
re-export it. Thus, users who register their buffer class manually
167+
with ``typing_extensions.Buffer.register`` should use a guard to make
168+
sure their code continues to work once ``types.Buffer`` is in the
169+
standard library.
170+
171+
172+
No special meaning for ``bytes``
173+
--------------------------------
174+
175+
The special case stating that ``bytes`` may be used as a shorthand
176+
for other ``ByteString`` types will be removed from the ``typing``
177+
documentation.
178+
With ``types.Buffer`` available as an alternative, there will be no good
179+
reason to allow ``bytes`` as a shorthand.
180+
We suggest that type checkers currently implementing this behavior
181+
should deprecate and eventually remove it.
182+
183+
184+
Backwards Compatibility
185+
=======================
186+
187+
As the runtime changes in this PEP only add a new class, there are
188+
no backwards compatibility concerns.
189+
190+
However, the recommendation to remove the special behavior for
191+
``bytes`` in type checkers does have a backwards compatibility
192+
impact on their users. An `experiment <https://github.com/python/mypy/pull/12661>`__
193+
with mypy shows that several major open source projects that use it
194+
for type checking will see new errors if the ``bytes`` promotion
195+
is removed. Many of these errors can be fixed by improving
196+
the stubs in typeshed, as has already been done for the
197+
`builtins <https://github.com/python/typeshed/pull/7631>`__,
198+
`binascii <https://github.com/python/typeshed/pull/7677>`__,
199+
`pickle <https://github.com/python/typeshed/pull/7678>`__, and
200+
`re <https://github.com/python/typeshed/pull/7679>`__ modules.
201+
Overall, the change improves type safety and makes the type system
202+
more consistent, so we believe the migration cost is worth it.
203+
204+
205+
How to Teach This
206+
=================
207+
208+
We will add notes pointing to ``types.Buffer`` in appropriate places in the
209+
documentation, such as `typing.readthedocs.io <https://typing.readthedocs.io/en/latest/>`__
210+
and the `mypy cheat sheet <https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html>`__.
211+
Type checkers may provide additional pointers in their error messages. For example,
212+
when they encounter a buffer object being passed to a function that
213+
is annotated to only accept ``bytes``, the error message could include a note suggesting
214+
the use of ``types.Buffer`` instead.
215+
216+
217+
Reference Implementation
218+
========================
219+
220+
An implementation of ``types.Buffer`` is
221+
`available <https://github.com/python/cpython/compare/main...JelleZijlstra:typesbuffer?expand=1>`__
222+
in the author's fork.
223+
224+
225+
Rejected Ideas
226+
==============
227+
228+
Buffer ABC
229+
----------
230+
231+
An `earlier proposal <https://github.com/python/cpython/issues/71688>`__ suggested
232+
adding a ``collections.abc.Buffer``
233+
`abstract base class <https://docs.python.org/3/glossary.html#term-abstract-base-class>`__
234+
to represent buffer objects. This idea
235+
stalled because an ABC with no methods does not fit well into the ``collections.abc``
236+
module. Furthermore, it required manual registration of buffer classes, including
237+
those in the standard library. This PEP's approach of using the ``__instancecheck__``
238+
hook is more natural and does not require explicit registration.
239+
240+
Nevertheless, the ABC proposal has the advantage that it does not require C changes.
241+
This PEP proposes to adopt a version of it in the third-party ``typing_extensions``
242+
package for the benefit of users of older Python versions.
243+
244+
Keep ``bytearray`` compatible with ``bytes``
245+
--------------------------------------------
246+
247+
It has been suggested to remove the special case where ``memoryview`` is
248+
always compatible with ``bytes``, but keep it for ``bytearray``, because
249+
the two types have very similar interfaces. However, several standard
250+
library functions (e.g., ``re.compile`` and ``socket.getaddrinfo``) accept
251+
``bytes`` but not ``bytearray``. In most codebases, ``bytearray`` is also
252+
not a very common type. We prefer to have users spell out accepted types
253+
explicitly (or use ``Protocol`` from :pep:`544` if only a specific set of
254+
methods is required).
255+
256+
257+
Open Issues
258+
===========
259+
260+
Read-only and writable buffers
261+
------------------------------
262+
263+
To avoid making changes to the buffer protocol itself, this PEP currently
264+
does not provide a way to distinguish between read-only and writable buffers.
265+
That's unfortunate, because some APIs require a writable buffer, and one of
266+
the most common buffer types (``bytes``) is always read-only.
267+
Should we add a new mechanism in C to declare that a type implementing the
268+
buffer protocol is potentially writable?
269+
270+
271+
Copyright
272+
=========
273+
274+
This document is placed in the public domain or under the
275+
CC0-1.0-Universal license, whichever is more permissive.

0 commit comments

Comments
 (0)
0