|
| 1 | +PEP: 688 |
| 2 | +Title: Making the buffer protocol accessible in Python |
| 3 | +Author: Jelle Zijlstra <jelle.zijlstra@gmail.com> |
| 4 | +Status: Draft |
| 5 | +Type: Standards Track |
| 6 | +Content-Type: text/x-rst |
| 7 | +Created: 23-Apr-2022 |
| 8 | +Python-Version: 3.12 |
| 9 | + |
| 10 | + |
| 11 | +Abstract |
| 12 | +======== |
| 13 | + |
| 14 | +This PEP proposes a mechanism for Python code to inspect whether a |
| 15 | +type supports the C-level buffer protocol. This allows type |
| 16 | +checkers to evaluate whether objects implement the protocol. |
| 17 | + |
| 18 | + |
| 19 | +Motivation |
| 20 | +========== |
| 21 | + |
| 22 | +The CPython C API provides a versatile mechanism for accessing the |
| 23 | +underlying memory of an object—the buffer protocol from :pep:`3118`. |
| 24 | +Functions that accept binary data are usually written to handle any |
| 25 | +object implementing the buffer protocol. For example, at the time of writing, |
| 26 | +there are around 130 functions in CPython using the Argument Clinic |
| 27 | +``Py_buffer`` type, which accepts the buffer protocol. |
| 28 | + |
| 29 | +Currently, there is no way for Python code to inspect whether an object |
| 30 | +supports the buffer protocol. Moreover, the static type system |
| 31 | +does not provide a type annotation to represent the protocol. |
| 32 | +This is a `common problem <https://github.com/python/typing/issues/593>`__ |
| 33 | +when writing type annotations for code that accepts generic buffers. |
| 34 | + |
| 35 | + |
| 36 | +Rationale |
| 37 | +========= |
| 38 | + |
| 39 | +Current options |
| 40 | +--------------- |
| 41 | + |
| 42 | +There are two current workarounds for annotating buffer types in |
| 43 | +the type system, but neither is adequate. |
| 44 | + |
| 45 | +First, the `current workaround <https://github.com/python/typeshed/blob/2a0fc1b582ef84f7a82c0beb39fa617de2539d3d/stdlib/_typeshed/__init__.pyi#L194>`__ |
| 46 | +for buffer types in typeshed is a type alias |
| 47 | +that lists well-known buffer types in the standard library, such as |
| 48 | +``bytes``, ``bytearray``, ``memoryview``, and ``array.array``. This |
| 49 | +approach works for the standard library, but it does not extend to |
| 50 | +third-party buffer types. |
| 51 | + |
| 52 | +Second, the `documentation <https://docs.python.org/3.10/library/typing.html#typing.ByteString>`__ |
| 53 | +for ``typing.ByteString`` currently states: |
| 54 | + |
| 55 | + This type represents the types ``bytes``, ``bytearray``, and |
| 56 | + ``memoryview`` of byte sequences. |
| 57 | + |
| 58 | + As a shorthand for this type, ``bytes`` can be used to annotate |
| 59 | + arguments of any of the types mentioned above. |
| 60 | + |
| 61 | +Although this sentence has been in the documentation |
| 62 | +`since 2015 <https://github.com/python/cpython/commit/2a19d956ab92fc9084a105cc11292cb0438b322f>`__, |
| 63 | +the use of ``bytes`` to include these other types is not specified |
| 64 | +in any of the typing PEPs. Furthermore, this mechanism has a number of |
| 65 | +problems. It does not include all possible buffer types, and it |
| 66 | +makes the ``bytes`` type ambiguous in type annotations. After all, |
| 67 | +there are many operations that are valid on ``bytes`` objects, but |
| 68 | +not on ``memoryview`` objects, and it is perfectly possible for |
| 69 | +a function to accept ``bytes`` but not ``memoryview`` objects. |
| 70 | +A mypy user |
| 71 | +`reports <https://github.com/python/mypy/issues/12643#issuecomment-1105914159>`__ |
| 72 | +that this shortcut has caused significant problems for the ``psycopg`` project. |
| 73 | + |
| 74 | +Kinds of buffers |
| 75 | +---------------- |
| 76 | + |
| 77 | +The C buffer protocol supports |
| 78 | +`many options <https://docs.python.org/3.10/c-api/buffer.html#buffer-request-types>`__, |
| 79 | +affecting strides, contiguity, and support for writing to the buffer. Some of these |
| 80 | +options would be useful in the type system. For example, typeshed |
| 81 | +currently provides separate type aliases for writable and read-only |
| 82 | +buffers. |
| 83 | + |
| 84 | +However, in the C buffer protocol, these options cannot be |
| 85 | +queried directly on the type object. The only way to figure out |
| 86 | +whether an object supports a writable buffer is to actually |
| 87 | +ask for the buffer. For some types, such as ``memoryview``, |
| 88 | +whether the buffer is writable depends on the instance: |
| 89 | +some instances are read-only and others are not. As such, we propose to |
| 90 | +expose only whether a type implements the buffer protocol at |
| 91 | +all, not whether it supports more specific options such as |
| 92 | +writable buffers. |
| 93 | + |
| 94 | +Specification |
| 95 | +============= |
| 96 | + |
| 97 | +types.Buffer |
| 98 | +------------ |
| 99 | + |
| 100 | +A new class, ``types.Buffer``, will be added. It cannot be instantiated or |
| 101 | +subclassed at runtime, but supports the ``__instancecheck__`` and |
| 102 | +``__subclasscheck__`` hooks. In CPython, these will check for the presence of the |
| 103 | +``bf_getbuffer`` slot in the type object: |
| 104 | + |
| 105 | +.. code-block:: pycon |
| 106 | +
|
| 107 | + >>> from types import Buffer |
| 108 | + >>> isinstance(b"xy", Buffer) |
| 109 | + True |
| 110 | + >>> issubclass(bytes, Buffer) |
| 111 | + True |
| 112 | + >>> issubclass(memoryview, Buffer) |
| 113 | + True |
| 114 | + >>> isinstance("xy", Buffer) |
| 115 | + False |
| 116 | + >>> issubclass(str, Buffer) |
| 117 | + False |
| 118 | +
|
| 119 | +The new class can also be used in type annotations: |
| 120 | + |
| 121 | +.. code-block:: python |
| 122 | +
|
| 123 | + def need_buffer(b: Buffer) -> memoryview: |
| 124 | + return memoryview(b) |
| 125 | +
|
| 126 | + need_buffer(b"xy") # ok |
| 127 | + need_buffer("xy") # rejected by static type checkers |
| 128 | +
|
| 129 | +Usage in stub files |
| 130 | +------------------- |
| 131 | + |
| 132 | +For static typing purposes, types defined in C extensions usually |
| 133 | +require stub files, as :pep:`described in PEP 484 <484#stub-files>`. |
| 134 | +In stub files, ``types.Buffer`` may be used as a base class to |
| 135 | +indicate that a class implements the buffer protocol. |
| 136 | + |
| 137 | +For example, ``memoryview`` may be declared as follows in a stub: |
| 138 | + |
| 139 | +.. code-block:: python |
| 140 | +
|
| 141 | + class memoryview(types.Buffer, Sized, Sequence[int]): |
| 142 | + ... |
| 143 | +
|
| 144 | +The ``types.Buffer`` class does not require any special treatment |
| 145 | +by type checkers. |
| 146 | + |
| 147 | +Equivalent for older Python versions |
| 148 | +------------------------------------ |
| 149 | + |
| 150 | +New typing features are usually backported to older Python versions |
| 151 | +in the `typing_extensions <https://pypi.org/project/typing-extensions/>`_ |
| 152 | +package. Because the buffer protocol |
| 153 | +is accessible only in C, ``types.Buffer`` cannot be implemented |
| 154 | +in a pure-Python package like ``typing_extensions``. As a temporary |
| 155 | +workaround, a ``typing_extensions.Buffer`` |
| 156 | +`abstract base class <Buffer ABC_>`__ will be provided for Python versions |
| 157 | +that do not have ``types.Buffer`` available. |
| 158 | + |
| 159 | +For the benefit of |
| 160 | +static type checkers, ``typing_extensions.Buffer`` can be used as |
| 161 | +a base class in stubs to mark types as supporting the buffer protocol. |
| 162 | +For runtime uses, the ``ABC.register`` API can be used to register |
| 163 | +buffer classes with ``typing_extensions.Buffer``. |
| 164 | + |
| 165 | +When ``types.Buffer`` is available, ``typing_extensions`` should simply |
| 166 | +re-export it. Thus, users who register their buffer class manually |
| 167 | +with ``typing_extensions.Buffer.register`` should use a guard to make |
| 168 | +sure their code continues to work once ``types.Buffer`` is in the |
| 169 | +standard library. |
| 170 | + |
| 171 | + |
| 172 | +No special meaning for ``bytes`` |
| 173 | +-------------------------------- |
| 174 | + |
| 175 | +The special case stating that ``bytes`` may be used as a shorthand |
| 176 | +for other ``ByteString`` types will be removed from the ``typing`` |
| 177 | +documentation. |
| 178 | +With ``types.Buffer`` available as an alternative, there will be no good |
| 179 | +reason to allow ``bytes`` as a shorthand. |
| 180 | +We suggest that type checkers currently implementing this behavior |
| 181 | +should deprecate and eventually remove it. |
| 182 | + |
| 183 | + |
| 184 | +Backwards Compatibility |
| 185 | +======================= |
| 186 | + |
| 187 | +As the runtime changes in this PEP only add a new class, there are |
| 188 | +no backwards compatibility concerns. |
| 189 | + |
| 190 | +However, the recommendation to remove the special behavior for |
| 191 | +``bytes`` in type checkers does have a backwards compatibility |
| 192 | +impact on their users. An `experiment <https://github.com/python/mypy/pull/12661>`__ |
| 193 | +with mypy shows that several major open source projects that use it |
| 194 | +for type checking will see new errors if the ``bytes`` promotion |
| 195 | +is removed. Many of these errors can be fixed by improving |
| 196 | +the stubs in typeshed, as has already been done for the |
| 197 | +`builtins <https://github.com/python/typeshed/pull/7631>`__, |
| 198 | +`binascii <https://github.com/python/typeshed/pull/7677>`__, |
| 199 | +`pickle <https://github.com/python/typeshed/pull/7678>`__, and |
| 200 | +`re <https://github.com/python/typeshed/pull/7679>`__ modules. |
| 201 | +Overall, the change improves type safety and makes the type system |
| 202 | +more consistent, so we believe the migration cost is worth it. |
| 203 | + |
| 204 | + |
| 205 | +How to Teach This |
| 206 | +================= |
| 207 | + |
| 208 | +We will add notes pointing to ``types.Buffer`` in appropriate places in the |
| 209 | +documentation, such as `typing.readthedocs.io <https://typing.readthedocs.io/en/latest/>`__ |
| 210 | +and the `mypy cheat sheet <https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html>`__. |
| 211 | +Type checkers may provide additional pointers in their error messages. For example, |
| 212 | +when they encounter a buffer object being passed to a function that |
| 213 | +is annotated to only accept ``bytes``, the error message could include a note suggesting |
| 214 | +the use of ``types.Buffer`` instead. |
| 215 | + |
| 216 | + |
| 217 | +Reference Implementation |
| 218 | +======================== |
| 219 | + |
| 220 | +An implementation of ``types.Buffer`` is |
| 221 | +`available <https://github.com/python/cpython/compare/main...JelleZijlstra:typesbuffer?expand=1>`__ |
| 222 | +in the author's fork. |
| 223 | + |
| 224 | + |
| 225 | +Rejected Ideas |
| 226 | +============== |
| 227 | + |
| 228 | +Buffer ABC |
| 229 | +---------- |
| 230 | + |
| 231 | +An `earlier proposal <https://github.com/python/cpython/issues/71688>`__ suggested |
| 232 | +adding a ``collections.abc.Buffer`` |
| 233 | +`abstract base class <https://docs.python.org/3/glossary.html#term-abstract-base-class>`__ |
| 234 | +to represent buffer objects. This idea |
| 235 | +stalled because an ABC with no methods does not fit well into the ``collections.abc`` |
| 236 | +module. Furthermore, it required manual registration of buffer classes, including |
| 237 | +those in the standard library. This PEP's approach of using the ``__instancecheck__`` |
| 238 | +hook is more natural and does not require explicit registration. |
| 239 | + |
| 240 | +Nevertheless, the ABC proposal has the advantage that it does not require C changes. |
| 241 | +This PEP proposes to adopt a version of it in the third-party ``typing_extensions`` |
| 242 | +package for the benefit of users of older Python versions. |
| 243 | + |
| 244 | +Keep ``bytearray`` compatible with ``bytes`` |
| 245 | +-------------------------------------------- |
| 246 | + |
| 247 | +It has been suggested to remove the special case where ``memoryview`` is |
| 248 | +always compatible with ``bytes``, but keep it for ``bytearray``, because |
| 249 | +the two types have very similar interfaces. However, several standard |
| 250 | +library functions (e.g., ``re.compile`` and ``socket.getaddrinfo``) accept |
| 251 | +``bytes`` but not ``bytearray``. In most codebases, ``bytearray`` is also |
| 252 | +not a very common type. We prefer to have users spell out accepted types |
| 253 | +explicitly (or use ``Protocol`` from :pep:`544` if only a specific set of |
| 254 | +methods is required). |
| 255 | + |
| 256 | + |
| 257 | +Open Issues |
| 258 | +=========== |
| 259 | + |
| 260 | +Read-only and writable buffers |
| 261 | +------------------------------ |
| 262 | + |
| 263 | +To avoid making changes to the buffer protocol itself, this PEP currently |
| 264 | +does not provide a way to distinguish between read-only and writable buffers. |
| 265 | +That's unfortunate, because some APIs require a writable buffer, and one of |
| 266 | +the most common buffer types (``bytes``) is always read-only. |
| 267 | +Should we add a new mechanism in C to declare that a type implementing the |
| 268 | +buffer protocol is potentially writable? |
| 269 | + |
| 270 | + |
| 271 | +Copyright |
| 272 | +========= |
| 273 | + |
| 274 | +This document is placed in the public domain or under the |
| 275 | +CC0-1.0-Universal license, whichever is more permissive. |
0 commit comments