|
| 1 | +PEP: 734 |
| 2 | +Title: Multiple Interpreters in the Stdlib |
| 3 | +Author: Eric Snow <ericsnowcurrently@gmail.com> |
| 4 | +Status: Draft |
| 5 | +Type: Standards Track |
| 6 | +Content-Type: text/x-rst |
| 7 | +Created: 06-Nov-2023 |
| 8 | +Python-Version: 3.13 |
| 9 | + |
| 10 | + |
| 11 | +Abstract |
| 12 | +======== |
| 13 | + |
| 14 | +I propose that we add a new module, "interpreters", to the standard |
| 15 | +library, to make the existing multiple-interpreters feature of CPython |
| 16 | +more easily accessible to Python code. This is particularly relevant |
| 17 | +now that we have a per-interpreter GIL (:pep:`684`) and people are |
| 18 | +more interested in using multiple interpreters. Without a stdlib |
| 19 | +module, users are limited to the `C-API`_, which restricts how much |
| 20 | +they can try out and take advantage of multiple interpreters. |
| 21 | + |
| 22 | +.. _C-API: |
| 23 | + https://docs.python.org/3/c-api/init.html#sub-interpreter-support |
| 24 | + |
| 25 | + |
| 26 | +Rationale |
| 27 | +========= |
| 28 | + |
| 29 | +The ``interpreters`` module will provide a high-level interface to the |
| 30 | +multiple interpreter functionality. Since we have no experience with |
| 31 | +how users will make use of multiple interpreters in Python code, we are |
| 32 | +purposefully keeping the initial API as lean and minimal as possible. |
| 33 | +The objective is to provide a well-considered foundation on which we may |
| 34 | +add more-advanced functionality later. |
| 35 | + |
| 36 | +That said, the proposed design incorporates lessons learned from |
| 37 | +existing use of subinterpreters by the community, from existing stdlib |
| 38 | +modules, and from other programming languages. It also factors in |
| 39 | +experience from using subinterpreters in the CPython test suite and |
| 40 | +using them in `concurrency benchmarks`_. |
| 41 | + |
| 42 | +.. _concurrency benchmarks: |
| 43 | + https://github.com/ericsnowcurrently/concurrency-benchmarks |
| 44 | + |
| 45 | +The module will include a basic mechanism for communicating between |
| 46 | +interpreters. Without one, multiple interpreters are a much less |
| 47 | +useful feature. |
| 48 | + |
| 49 | + |
| 50 | +Specification |
| 51 | +============= |
| 52 | + |
| 53 | +The module will: |
| 54 | + |
| 55 | +* expose the existing multiple interpreter support |
| 56 | +* introduce a basic mechanism for communicating between interpreters |
| 57 | + |
| 58 | +The module will wrap a new low-level ``_interpreters`` module |
| 59 | +(in the same way as the ``threading`` module). However, that low-level |
| 60 | +API is not intended for public use and thus not part of this proposal. |
| 61 | + |
| 62 | +We also expect that an ``InterpreterPoolExecutor`` will be added to the |
| 63 | +``concurrent.futures`` module, but that is outside the scope of this PEP. |
| 64 | + |
| 65 | +API: Using Interpreters |
| 66 | +----------------------- |
| 67 | + |
| 68 | +The module's top-level API for managing interpreters looks like this: |
| 69 | + |
| 70 | ++----------------------------------+----------------------------------------------+ |
| 71 | +| signature | description | |
| 72 | ++==================================+==============================================+ |
| 73 | +| ``list_all() -> [Interpreter]`` | Get all existing interpreters. | |
| 74 | ++----------------------------------+----------------------------------------------+ |
| 75 | +| ``get_current() -> Interpreter`` | Get the currently running interpreter. | |
| 76 | ++----------------------------------+----------------------------------------------+ |
| 77 | +| ``create() -> Interpreter`` | Initialize a new (idle) Python interpreter. | |
| 78 | ++----------------------------------+----------------------------------------------+ |
| 79 | + |
| 80 | +Each interpreter object: |
| 81 | + |
| 82 | ++----------------------------------+------------------------------------------------+ |
| 83 | +| signature | description | |
| 84 | ++==================================+================================================+ |
| 85 | +| ``class Interpreter`` | A single interpreter. | |
| 86 | ++----------------------------------+------------------------------------------------+ |
| 87 | +| ``.id`` | The interpreter's ID (read-only). | |
| 88 | ++----------------------------------+------------------------------------------------+ |
| 89 | +| ``.is_running() -> bool`` | Is the interpreter currently executing code? | |
| 90 | ++----------------------------------+------------------------------------------------+ |
| 91 | +| ``.set_main_attrs(**kwargs)`` | Bind objects in ``__main__``. | |
| 92 | ++----------------------------------+------------------------------------------------+ |
| 93 | +| ``.exec(code, /)`` | | Run the given source code in the interpreter | |
| 94 | +| | | (in the current thread). | |
| 95 | ++----------------------------------+------------------------------------------------+ |
| 96 | + |
| 97 | +Additional details: |
| 98 | + |
| 99 | +* Every ``Interpreter`` instance wraps an ``InterpreterID`` object. |
| 100 | + When there are no more references to an interpreter's ID, it gets |
| 101 | + finalized. Thus no interpreters created through |
| 102 | + ``interpreters.create()`` will leak. |
| 103 | + |
| 104 | +| |
| 105 | +
|
| 106 | +* ``Interpreter.is_running()`` refers only to if there is a thread |
| 107 | + running a script (code) in the interpreter's ``__main__`` module. |
| 108 | + That basically means whether or not ``Interpreter.exec()`` is running |
| 109 | + in some thread. Code running in sub-threads is ignored. |
| 110 | + |
| 111 | +| |
| 112 | +
|
| 113 | +* ``Interpreter.set_main_attrs()`` will only allow (for now) objects |
| 114 | + that are specifically supported for passing between interpreters. |
| 115 | + See `Shareable Objects`_. |
| 116 | + |
| 117 | +| |
| 118 | +
|
| 119 | +* ``Interpreter.set_main_attrs()`` is helpful for initializing the |
| 120 | + globals for an interpreter before running code in it. |
| 121 | + |
| 122 | +| |
| 123 | +
|
| 124 | +* ``Interpreter.exec()`` does not reset the interpreter's state nor |
| 125 | + the ``__main__`` module, neither before nor after, so each |
| 126 | + successive call picks up where the last one left off. This can |
| 127 | + be useful for running some code to initialize an interpreter |
| 128 | + (e.g. with imports) before later performing some repeated task. |
| 129 | + |
| 130 | +Comparison with builtins.exec() |
| 131 | +------------------------------- |
| 132 | + |
| 133 | +``Interpreter.exec()`` is essentially the same as the builtin |
| 134 | +``exec()``, except it targets a different interpreter, using that |
| 135 | +interpreter's isolated state. |
| 136 | + |
| 137 | +The builtin ``exec()`` runs in the current OS thread and pauses |
| 138 | +whatever was running there, which resumes when ``exec()`` finishes. |
| 139 | +No other threads are affected. (To avoid pausing the current thread, |
| 140 | +run ``exec()`` in a ``threading.Thread``.) |
| 141 | + |
| 142 | +``Interpreter.exec()`` works the same way. |
| 143 | + |
| 144 | +The builtin ``exec()`` take a namespace against which it executes. |
| 145 | +It uses that namespace as-is and does not clear it before or after. |
| 146 | + |
| 147 | +``Interpreter.exec()`` works the same way. |
| 148 | + |
| 149 | +...with one slight difference: the namespace is implicit |
| 150 | +(the ``__main__`` module's ``__dict__``). This is the same as how |
| 151 | +scripts run from the Python commandline or REPL work. |
| 152 | + |
| 153 | +The builtin ``exec()`` discards any object returned from the |
| 154 | +executed code. |
| 155 | + |
| 156 | +``Interpreter.exec()`` works the same way. |
| 157 | + |
| 158 | +The builtin ``exec()`` propagates any uncaught exception from the code |
| 159 | +it ran. The exception is raised from the ``exec()`` call in the |
| 160 | +thread that originally called ``exec()``. |
| 161 | + |
| 162 | +``Interpreter.exec()`` works the same way. |
| 163 | + |
| 164 | +...with one slight difference. Rather than propagate the uncaught |
| 165 | +exception directly, we raise an ``interpreters.RunFailedError`` |
| 166 | +with a snapshot of the uncaught exception (including its traceback) |
| 167 | +as the ``__cause__``. Directly raising (a proxy of) the exception |
| 168 | +is problematic since it's harder to distinguish between an error |
| 169 | +in the ``Interpreter.exec()`` call and an uncaught exception |
| 170 | +from the subinterpreter. |
| 171 | + |
| 172 | +API: Communicating Between Interpreters |
| 173 | +--------------------------------------- |
| 174 | + |
| 175 | +The module introduces a basic communication mechanism called "channels". |
| 176 | +They are based on `CSP`_, as is ``Go``'s concurrency model (loosely). |
| 177 | +Channels are like pipes: FIFO queues with distinct send/recv ends. |
| 178 | +They are designed to work safely between isolated interpreters. |
| 179 | + |
| 180 | +.. _CSP: |
| 181 | + https://en.wikipedia.org/wiki/Communicating_sequential_processes |
| 182 | + |
| 183 | +For now, only objects that are specifically supported for passing |
| 184 | +between interpreters may be sent through a channel. |
| 185 | +See `Shareable Objects`_. |
| 186 | + |
| 187 | +The module's top-level API for this new mechanism: |
| 188 | + |
| 189 | ++----------------------------------------------------+-----------------------+ |
| 190 | +| signature | description | |
| 191 | ++====================================================+=======================+ |
| 192 | +| ``create_channel() -> (RecvChannel, SendChannel)`` | Create a new channel. | |
| 193 | ++----------------------------------------------------+-----------------------+ |
| 194 | + |
| 195 | +The objects for the two ends of a channel: |
| 196 | + |
| 197 | ++------------------------------------------+-----------------------------------------------+ |
| 198 | +| signature | description | |
| 199 | ++==========================================+===============================================+ |
| 200 | +| ``class RecvChannel(id)`` | The receiving end of a channel. | |
| 201 | ++------------------------------------------+-----------------------------------------------+ |
| 202 | +| ``.id`` | The channel's unique ID. | |
| 203 | ++------------------------------------------+-----------------------------------------------+ |
| 204 | +| ``.recv() -> object`` | | Get the next object from the channel, | |
| 205 | +| | | and wait if none have been sent. | |
| 206 | ++------------------------------------------+-----------------------------------------------+ |
| 207 | +| ``.recv_nowait(default=None) -> object`` | | Like recv(), but return the default | |
| 208 | +| | | instead of waiting. | |
| 209 | ++------------------------------------------+-----------------------------------------------+ |
| 210 | + |
| 211 | +| |
| 212 | +
|
| 213 | ++------------------------------+---------------------------------------------------------------------+ |
| 214 | +| signature | description | |
| 215 | ++==============================+=====================================================================+ |
| 216 | +| ``class SendChannel(id)`` | The sending end of a channel. | |
| 217 | ++------------------------------+---------------------------------------------------------------------+ |
| 218 | +| ``.id`` | The channel's unique ID. | |
| 219 | ++------------------------------+---------------------------------------------------------------------+ |
| 220 | +| ``.send(obj)`` | | Send the `shareable object <Shareable Objects_>`_ (i.e. its data) | |
| 221 | +| | | to the receiving end of the channel and wait. | |
| 222 | ++------------------------------+---------------------------------------------------------------------+ |
| 223 | +| ``.send_nowait(obj)`` | Like send(), but return False if not received. | |
| 224 | ++------------------------------+---------------------------------------------------------------------+ |
| 225 | + |
| 226 | +Shareable Objects |
| 227 | +----------------- |
| 228 | + |
| 229 | +Both ``Interpreter.set_main_attrs()`` and channels work only with |
| 230 | +"shareable" objects. |
| 231 | + |
| 232 | +A "shareable" object is one which may be passed from one interpreter |
| 233 | +to another. The object is not necessarily actually shared by the |
| 234 | +interpreters. However, the object in the one interpreter is guaranteed |
| 235 | +to exactly match the corresponding object in the other interpreter. |
| 236 | + |
| 237 | +For some types, the actual object is shared. For some, the object's |
| 238 | +underlying data is actually shared but each interpreter has a distinct |
| 239 | +object wrapping that data. For all other shareable types, a strict copy |
| 240 | +or proxy is made such that the corresponding objects continue to match. |
| 241 | + |
| 242 | +For now, shareable objects must be specifically supported internally |
| 243 | +by the Python runtime. |
| 244 | + |
| 245 | +Here's the initial list of supported objects: |
| 246 | + |
| 247 | +* ``str`` |
| 248 | +* ``bytes`` |
| 249 | +* ``int`` |
| 250 | +* ``float`` |
| 251 | +* ``bool`` (``True``/``False``) |
| 252 | +* ``None`` |
| 253 | +* ``tuple`` (only with shareable items)
F438
td> |
| 254 | +* channels (``SendChannel``/``RecvChannel``) |
| 255 | +* ``memoryview`` |
| 256 | + |
| 257 | +Again, for some types the actual object is shared, whereas for others |
| 258 | +only the underlying data (or even a copy or proxy) is shared. |
| 259 | +Eventually mutable objects may also be shareable. |
| 260 | + |
| 261 | +Regardless, the guarantee of "shareable" objects is that corresponding |
| 262 | +objects in different interpreters will always strictly match each other. |
| 263 | + |
| 264 | +Examples |
| 265 | +-------- |
| 266 | + |
| 267 | +Using interpreters as workers, with channels to communicate: |
| 268 | + |
| 269 | +:: |
| 270 | + |
| 271 | + tasks_recv, tasks = interpreters.create_channel() |
| 272 | + results, results_send = interpreters.create_channel() |
| 273 | + |
| 274 | + def worker(): |
| 275 | + interp = interpreters.create() |
| 276 | + interp.set_main_attrs(tasks=tasks_recv, results=results_send) |
| 277 | + interp.exec(tw.dedent(""" |
| 278 | + def handle_request(req): |
| 279 | + ... |
| 280 | + |
| 281 | + def capture_exception(exc): |
| 282 | + ... |
| 283 | + |
| 284 | + while True: |
| 285 | + try: |
| 286 | + req = tasks.recv() |
| 287 | + except Exception: |
| 288 | + # channel closed |
| 289 | + break |
| 290 | + try: |
| 291 | + res = handle_request(req) |
| 292 | + except Exception as exc: |
| 293 | + res = capture_exception(exc) |
| 294 | + results.send_nowait(res) |
| 295 | + """)) |
| 296 | + threads = [threading.Thread(target=worker) for _ in range(20)] |
| 297 | + for t in threads: |
| 298 | + t.start() |
| 299 | + |
| 300 | + requests = ... |
| 301 | + for req in requests: |
| 302 | + tasks.send(req) |
| 303 | + tasks.close() |
| 304 | + |
| 305 | + for t in threads: |
| 306 | + t.join() |
| 307 | + |
| 308 | +Sharing a memoryview (imagine map-reduce): |
| 309 | + |
| 310 | +:: |
| 311 | + |
| 312 | + data, chunksize = read_large_data_set() |
| 313 | + buf = memoryview(data) |
| 314 | + numchunks = (len(buf) + 1) / chunksize |
| 315 | + results = memoryview(b'\0' * numchunks) |
| 316 | + |
| 317 | + tasks_recv, tasks = interpreters.create_channel() |
| 318 | + |
| 319 | + def worker(): |
| 320 | + interp = interpreters.create() |
| 321 | + interp.set_main_attrs(data=buf, results=results, tasks=tasks_recv) |
| 322 | + interp.exec(tw.dedent(""" |
| 323 | + while True: |
| 324 | + try: |
| 325 | + req = tasks.recv() |
| 326 | + except Exception: |
| 327 | + # channel closed |
| 328 | + break |
| 329 | + resindex, start, end = req |
| 330 | + chunk = data[start: end] |
| 331 | + res = reduce_chunk(chunk) |
| 332 | + results[resindex] = res |
| 333 | + """)) |
| 334 | + t = threading.Thread(target=worker) |
| 335 | + t.start() |
| 336 | + |
| 337 | + for i in range(numchunks): |
| 338 | + if not workers_running(): |
| 339 | + raise ... |
| 340 | + start = i * chunksize |
| 341 | + end = start + chunksize |
| 342 | + if end > len(buf): |
| 343 | + end = len(buf) |
| 344 | + tasks.send((start, end, i)) |
| 345 | + tasks.close() |
| 346 | + t.join() |
| 347 | + |
| 348 | + use_results(results) |
| 349 | + |
| 350 | + |
| 351 | +Documentation |
| 352 | +============= |
| 353 | + |
| 354 | +The new stdlib docs page for the ``interpreters`` module will include |
| 355 | +the following: |
| 356 | + |
| 357 | +* (at the top) a clear note that support for multiple interpreters |
| 358 | + is not required from extension modules |
| 359 | +* some explanation about what subinterpreters are |
| 360 | +* brief examples of how to use multiple interpreters |
| 361 | + (and communicating between them) |
| 362 | +* a summary of the limitations of using multiple interpreters |
| 363 | +* (for extension maintainers) a link to the resources for ensuring |
| 364 | + multiple interpreters compatibility |
| 365 | +* much of the API information in this PEP |
| 366 | + |
| 367 | +Docs about resources for extension maintainers already exist on the |
| 368 | +`Isolating Extension Modules <isolation-howto_>`_ howto page. Any |
| 369 | +extra help will be added there. For example, it may prove helpful |
| 370 | +to discuss strategies for dealing with linked libraries that keep |
| 371 | +their own subinterpreter-incompatible global state. |
| 372 | + |
| 373 | +.. _isolation-howto: |
| 374 | + https://docs.python.org/3/howto/isolating-extensions.html |
| 375 | + |
| 376 | +Also, the ``ImportError`` for incompatible extension modules will be |
| 377 | +updated to clearly say it is due to missing multiple interpreters |
| 378 | +compatibility and that extensions are not required to provide it. This |
| 379 | +will help set user expectations properly. |
| 380 | + |
| 381 | + |
| 382 | +Rejected Ideas |
| 383 | +============== |
| 384 | + |
| 385 | +See :pep:`554`. |
| 386 | + |
| 387 | + |
| 388 | +Copyright |
| 389 | +========= |
| 390 | + |
| 391 | +This document is placed in the public domain or under the |
| 392 | +CC0-1.0-Universal license, whichever is more permissive. |
0 commit comments