8000 Add PEP 734. · python/peps@1b7f9eb · GitHub
[go: up one dir, main page]

Skip to content

Commit 1b7f9eb

Browse files
Add PEP 734.
1 parent c8d079f commit 1b7f9eb

File tree

1 file changed

+392
-0
lines changed

1 file changed

+392
-0
lines changed

peps/pep-0734.rst

Lines changed: 392 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,392 @@
1+
PEP: 734
2+
Title: Multiple Interpreters in the Stdlib
3+
Author: Eric Snow <ericsnowcurrently@gmail.com>
4+
Status: Draft
5+
Type: Standards Track
6+
Content-Type: text/x-rst
7+
Created: 06-Nov-2023
8+
Python-Version: 3.13
9+
10+
11+
Abstract
12+
========
13+
14+
I propose that we add a new module, "interpreters", to the standard
15+
library, to make the existing multiple-interpreters feature of CPython
16+
more easily accessible to Python code. This is particularly relevant
17+
now that we have a per-interpreter GIL (:pep:`684`) and people are
18+
more interested in using multiple interpreters. Without a stdlib
19+
module, users are limited to the `C-API`_, which restricts how much
20+
they can try out and take advantage of multiple interpreters.
21+
22+
.. _C-API:
23+
https://docs.python.org/3/c-api/init.html#sub-interpreter-support
24+
25+
26+
Rationale
27+
=========
28+
29+
The ``interpreters`` module will provide a high-level interface to the
30+
multiple interpreter functionality. Since we have no experience with
31+
how users will make use of multiple interpreters in Python code, we are
32+
purposefully keeping the initial API as lean and minimal as possible.
33+
The objective is to provide a well-considered foundation on which we may
34+
add more-advanced functionality later.
35+
36+
That said, the proposed design incorporates lessons learned from
37+
existing use of subinterpreters by the community, from existing stdlib
38+
modules, and from other programming languages. It also factors in
39+
experience from using subinterpreters in the CPython test suite and
40+
using them in `concurrency benchmarks`_.
41+
42+
.. _concurrency benchmarks:
43+
https://github.com/ericsnowcurrently/concurrency-benchmarks
44+
45+
The module will include a basic mechanism for communicating between
46+
interpreters. Without one, multiple interpreters are a much less
47+
useful feature.
48+
49+
50+
Specification
51+
=============
52+
53+
The module will:
54+
55+
* expose the existing multiple interpreter support
56+
* introduce a basic mechanism for communicating between interpreters
57+
58+
The module will wrap a new low-level ``_interpreters`` module
59+
(in the same way as the ``threading`` module). However, that low-level
60+
API is not intended for public use and thus not part of this proposal.
61+
62+
We also expect that an ``InterpreterPoolExecutor`` will be added to the
63+
``concurrent.futures`` module, but that is outside the scope of this PEP.
64+
65+
API: Using Interpreters
66+
-----------------------
67+
68+
The module's top-level API for managing interpreters looks like this:
69+
70+
+----------------------------------+----------------------------------------------+
71+
| signature | description |
72+
+==================================+==============================================+
73+
| ``list_all() -> [Interpreter]`` | Get all existing interpreters. |
74+
+----------------------------------+----------------------------------------------+
75+
| ``get_current() -> Interpreter`` | Get the currently running interpreter. |
76+
+----------------------------------+----------------------------------------------+
77+
| ``create() -> Interpreter`` | Initialize a new (idle) Python interpreter. |
78+
+----------------------------------+----------------------------------------------+
79+
80+
Each interpreter object:
81+
82+
+----------------------------------+------------------------------------------------+
83+
| signature | description |
84+
+==================================+================================================+
85+
| ``class Interpreter`` | A single interpreter. |
86+
+----------------------------------+------------------------------------------------+
87+
| ``.id`` | The interpreter's ID (read-only). |
88+
+----------------------------------+------------------------------------------------+
89+
| ``.is_running() -> bool`` | Is the interpreter currently executing code? |
90+
+----------------------------------+------------------------------------------------+
91+
| ``.set_main_attrs(**kwargs)`` | Bind objects in ``__main__``. |
92+
+----------------------------------+------------------------------------------------+
93+
| ``.exec(code, /)`` | | Run the given source code in the interpreter |
94+
| | | (in the current thread). |
95+
+----------------------------------+------------------------------------------------+
96+
97+
Additional details:
98+
99+
* Every ``Interpreter`` instance wraps an ``InterpreterID`` object.
100+
When there are no more references to an interpreter's ID, it gets
101+
finalized. Thus no interpreters created through
102+
``interpreters.create()`` will leak.
103+
104+
|
105+
106+
* ``Interpreter.is_running()`` refers only to if there is a thread
107+
running a script (code) in the interpreter's ``__main__`` module.
108+
That basically means whether or not ``Interpreter.exec()`` is running
109+
in some thread. Code running in sub-threads is ignored.
110+
111+
|
112+
113+
* ``Interpreter.set_main_attrs()`` will only allow (for now) objects
114+
that are specifically supported for passing between interpreters.
115+
See `Shareable Objects`_.
116+
117+
|
118+
119+
* ``Interpreter.set_main_attrs()`` is helpful for initializing the
120+
globals for an interpreter before running code in it.
121+
122+
|
123+
124+
* ``Interpreter.exec()`` does not reset the interpreter's state nor
125+
the ``__main__`` module, neither before nor after, so each
126+
successive call picks up where the last one left off. This can
127+
be useful for running some code to initialize an interpreter
128+
(e.g. with imports) before later performing some repeated task.
129+
130+
Comparison with builtins.exec()
131+
-------------------------------
132+
133+
``Interpreter.exec()`` is essentially the same as the builtin
134+
``exec()``, except it targets a different interpreter, using that
135+
interpreter's isolated state.
136+
137+
The builtin ``exec()`` runs in the current OS thread and pauses
138+
whatever was running there, which resumes when ``exec()`` finishes.
139+
No other threads are affected. (To avoid pausing the current thread,
140+
run ``exec()`` in a ``threading.Thread``.)
141+
142+
``Interpreter.exec()`` works the same way.
143+
144+
The builtin ``exec()`` take a namespace against which it executes.
145+
It uses that namespace as-is and does not clear it before or after.
146+
147+
``Interpreter.exec()`` works the same way.
148+
149+
...with one slight difference: the namespace is implicit
150+
(the ``__main__`` module's ``__dict__``). This is the same as how
151+
scripts run from the Python commandline or REPL work.
152+
153+
The builtin ``exec()`` discards any object returned from the
154+
executed code.
155+
156+
``Interpreter.exec()`` works the same way.
157+
158+
The builtin ``exec()`` propagates any uncaught exception from the code
159+
it ran. The exception is raised from the ``exec()`` call in the
160+
thread that originally called ``exec()``.
161+
162+
``Interpreter.exec()`` works the same way.
163+
164+
...with one slight difference. Rather than propagate the uncaught
165+
exception directly, we raise an ``interpreters.RunFailedError``
166+
with a snapshot of the uncaught exception (including its traceback)
167+
as the ``__cause__``. Directly raising (a proxy of) the exception
168+
is problematic since it's harder to distinguish between an error
169+
in the ``Interpreter.exec()`` call and an uncaught exception
170+
from the subinterpreter.
171+
172+
API: Communicating Between Interpreters
173+
---------------------------------------
174+
175+
The module introduces a basic communication mechanism called "channels".
176+
They are based on `CSP`_, as is ``Go``'s concurrency model (loosely).
177+
Channels are like pipes: FIFO queues with distinct send/recv ends.
178+
They are designed to work safely between isolated interpreters.
179+
180+
.. _CSP:
181+
https://en.wikipedia.org/wiki/Communicating_sequential_processes
182+
183+
For now, only objects that are specifically supported for passing
184+
between interpreters may be sent through a channel.
185+
See `Shareable Objects`_.
186+
187+
The module's top-level API for this new mechanism:
188+
189+
+----------------------------------------------------+-----------------------+
190+
| signature | description |
191+
+====================================================+=======================+
192+
| ``create_channel() -> (RecvChannel, SendChannel)`` | Create a new channel. |
193+
+----------------------------------------------------+-----------------------+
194+
195+
The objects for the two ends of a channel:
196+
197+
+------------------------------------------+-----------------------------------------------+
198+
| signature | description |
199+
+==========================================+===============================================+
200+
| ``class RecvChannel(id)`` | The receiving end of a channel. |
201+
+------------------------------------------+-----------------------------------------------+
202+
| ``.id`` | The channel's unique ID. |
203+
+------------------------------------------+-----------------------------------------------+
204+
| ``.recv() -> object`` | | Get the next object from the channel, |
205+
| | | and wait if none have been sent. |
206+
+------------------------------------------+-----------------------------------------------+
207+
| ``.recv_nowait(default=None) -> object`` | | Like recv(), but return the default |
208+
| | | instead of waiting. |
209+
+------------------------------------------+-----------------------------------------------+
210+
211+
|
212+
213+
+------------------------------+---------------------------------------------------------------------+
214+
| signature | description |
215+
+==============================+=====================================================================+
216+
| ``class SendChannel(id)`` | The sending end of a channel. |
217+
+------------------------------+---------------------------------------------------------------------+
218+
| ``.id`` | The channel's unique ID. |
219+
+------------------------------+---------------------------------------------------------------------+
220+
| ``.send(obj)`` | | Send the `shareable object <Shareable Objects_>`_ (i.e. its data) |
221+
| | | to the receiving end of the channel and wait. |
222+
+------------------------------+---------------------------------------------------------------------+
223+
| ``.send_nowait(obj)`` | Like send(), but return False if not received. |
224+
+------------------------------+---------------------------------------------------------------------+
225+
226+
Shareable Objects
227+
-----------------
228+
229+
Both ``Interpreter.set_main_attrs()`` and channels work only with
230+
"shareable" objects.
231+
232+
A "shareable" object is one which may be passed from one interpreter
233+
to another. The object is not necessarily actually shared by the
234+
interpreters. However, the object in the one interpreter is guaranteed
235+
to exactly match the corresponding object in the other interpreter.
236+
237+
For some types, the actual object is shared. For some, the object's
238+
underlying data is actually shared but each interpreter has a distinct
239+
object wrapping that data. For all other shareable types, a strict copy
240+
or proxy is made such that the corresponding objects continue to match.
241+
242+
For now, shareable objects must be specifically supported internally
243+
by the Python runtime.
244+
245+
Here's the initial list of supported objects:
246+
247+
* ``str``
248+
* ``bytes``
249+
* ``int``
250+
* ``float``
251+
* ``bool`` (``True``/``False``)
252+
* ``None``
253+
* ``tuple`` (only with shareable items)
254+
* channels (``SendChannel``/``RecvChannel``)
255+
* ``memoryview``
256+
257+
Again, for some types the actual object is shared, whereas for others
258+
only the underlying data (or even a copy or proxy) is shared.
259+
Eventually mutable objects may also be shareable.
260+
261+
Regardless, the guarantee of "shareable" objects is that corresponding
262+
objects in different interpreters will always strictly match each other.
263+
264+
Examples
265+
--------
266+
267+
Using interpreters as workers, with channels to communicate:
268+
269+
::
270+
271+
tasks_recv, tasks = interpreters.create_channel()
272+
results, results_send = interpreters.create_channel()
273+
274+
def worker():
275+
interp = interpreters.create()
276+
interp.set_main_attrs(tasks=tasks_recv, results=results_send)
277+
interp.exec(tw.dedent("""
278+
def handle_request(req):
279+
...
280+
281+
def capture_exception(exc):
282+
...
283+
284+
while True:
285+
try:
286+
req = tasks.recv()
287+
except Exception:
288+
# channel closed
289+
break
290+
try:
291+
res = handle_request(req)
292+
except Exception as exc:
293+
res = capture_exception(exc)
294+
results.send_nowait(res)
295+
"""))
296+
threads = [threading.Thread(target=worker) for _ in range(20)]
297+
for t in threads:
298+
t.start()
299+
300+
requests = ...
301+
for req in requests:
302+
tasks.send(req)
303+
tasks.close()
304+
305+
for t in threads:
306+
t.join()
307+
308+
Sharing a memoryview (imagine map-reduce):
309+
310+
::
311+
312+
data, chunksize = read_large_data_set()
313+
buf = memoryview(data)
314+
numchunks = (len(buf) + 1) / chunksize
315+
results = memoryview(b'\0' * numchunks)
316+
317+
tasks_recv, tasks = interpreters.create_channel()
318+
319+
def worker():
320+
interp = interpreters.create()
321+
interp.set_main_attrs(data=buf, results=results, tasks=tasks_recv)
322+
interp.exec(tw.dedent("""
323+
while True:
324+
try:
325+
req = tasks.recv()
326+
except Exception:
327+
# channel closed
328+
break
329+
resindex, start, end = req
330+
chunk = data[start: end]
331+
res = reduce_chunk(chunk)
332+
results[resindex] = res
333+
"""))
334+
t = threading.Thread(target=worker)
335+
t.start()
336+
337+
for i in range(numchunks):
338+
if not workers_running():
339+
raise ...
340+
start = i * chunksize
341+
end = start + chunksize
342+
if end > len(buf):
343+
end = len(buf)
344+
tasks.send((start, end, i))
345+
tasks.close()
346+
t.join()
347+
348+
use_results(results)
349+
350+
351+
Documentation
352+
=============
353+
354+
The new stdlib docs page for the ``interpreters`` module will include
355+
the following:
356+
357+
* (at the top) a clear note that support for multiple interpreters
358+
is not required from extension modules
359+
* some explanation about what subinterpreters are
360+
* brief examples of how to use multiple interpreters
361+
(and communicating between them)
362+
* a summary of the limitations of using multiple interpreters
363+
* (for extension maintainers) a link to the resources for ensuring
364+
multiple interpreters compatibility
365+
* much of the API information in this PEP
366+
367+
Docs about resources for extension maintainers already exist on the
368+
`Isolating Extension Modules <isolation-howto_>`_ howto page. Any
369+
extra help will be added there. For example, it may prove helpful
370+
to discuss strategies for dealing with linked libraries that keep
371+
their own subinterpreter-incompatible global state.
372+
373+
.. _isolation-howto:
374+
https://docs.python.org/3/howto/isolating-extensions.html
375+
376+
Also, the ``ImportError`` for incompatible extension modules will be
377+
updated to clearly say it is due to missing multiple interpreters
378+
compatibility and that extensions are not required to provide it. This
379+
will help set user expectations properly.
380+
381+
382+
Rejected Ideas
383+
==============
384+
385+
See :pep:`554`.
386+
387+
388+
Copyright
389+
=========
390+
391+
This document is placed in the public domain or under the
392+
CC0-1.0-Universal license, whichever is more permissive.

0 commit comments

Comments
 (0)
0