From c5271cf8988a4a7d4aed62dba7aeff3de90d9894 Mon Sep 17 00:00:00 2001 From: Eric Snow Date: Fri, 22 Mar 2019 22:35:11 -0600 Subject: [PATCH 1/6] Fix typos, formatting, and small clarifications. --- pep-0554.rst | 112 +++++++++++++++++++++++++++++---------------------- 1 file changed, 63 insertions(+), 49 deletions(-) diff --git a/pep-0554.rst b/pep-0554.rst index d7394d369e9..4cf07b1b579 100644 --- a/pep-0554.rst +++ b/pep-0554.rst @@ -15,7 +15,7 @@ Abstract CPython has supported multiple interpreters in the same process (AKA "subinterpreters") since version 1.5 (1997). The feature has been -available via the C-API. [c-api]_ Subinterpreters operate in +available via the C-API. [c-api]_ Subinterpreters operate in `relative isolation from one another `_, which provides the basis for an `alternative concurrency model `_. @@ -152,7 +152,7 @@ For sharing data between interpreters: | | | receiving end of the channel and wait. | | | | Associate the interpreter with the channel. | +---------------------------+-------------------------------------------------+ -| .send_nowait(obj) | | Like send(), but Fail if not received. | +| .send_nowait(obj) | | Like send(), but fail if not received. | +---------------------------+-------------------------------------------------+ | .send_buffer(obj) | | Send the object's (PEP 3118) buffer to the | | | | receiving end of the channel and wait. | @@ -494,8 +494,8 @@ each with different goals. Most center on correctness and usability. One class of concurrency models focuses on isolated threads of execution that interoperate through some message passing scheme. A -notable example is `Communicating Sequential Processes`_ (CSP), upon -which Go's concurrency is based. The isolation inherent to +notable example is `Communicating Sequential Processes`_ (CSP) (upon +which Go's concurrency is roughly based). The isolation inherent to subinterpreters makes them well-suited to this approach. Shared data @@ -521,9 +521,9 @@ There are a number of valid solutions, several of which may be appropriate to support in Python. This proposal provides a single basic solution: "channels". Ultimately, any other solution will look similar to the proposed one, which will set the precedent. Note that the -implementation of ``Interpreter.run()`` can be done in a way that allows -for multiple solutions to coexist, but doing so is not technically -a part of the proposal here. +implementation of ``Interpreter.run()`` will be done in a way that +allows for multiple solutions to coexist, but doing so is not +technically a part of the proposal here. Regarding the proposed solution, "channels", it is a basic, opt-in data sharing mechanism that draws inspiration from pipes, queues, and CSP's @@ -534,7 +534,8 @@ channels have two operations: send and receive. A key characteristic of those operations is that channels transmit data derived from Python objects rather than the objects themselves. When objects are sent, their data is extracted. When the "object" is received in the other -interpreter, the data is converted back into an object. +interpreter, the data is converted back into an object owned by that +interpreter. To make this work, the mutable shared state will be managed by the Python runtime, not by any of the interpreters. Initially we will @@ -589,11 +590,11 @@ Finally, some potential isolation is missing due to the current design of CPython. Improvements are currently going on to address gaps in this area: -* interpreters share the GIL -* interpreters share memory management (e.g. allocators, gc) * GC is not run per-interpreter [global-gc]_ * at-exit handlers are not run per-interpreter [global-atexit]_ * extensions using the ``PyGILState_*`` API are incompatible [gilstate]_ +* interpreters share memory management (e.g. allocators, gc) +* interpreters share the GIL Existing Usage -------------- @@ -683,7 +684,7 @@ The module also provides the following class: "channels" keyword argument is provided (and is a mapping of attribute names to channels) then it is added to the interpreter's execution namespace (the interpreter's "__main__" module). If any - of the values are not are not RecvChannel or SendChannel instances + of the values are not RecvChannel or SendChannel instances then ValueError gets raised. This may not be called on an already running interpreter. Doing @@ -763,14 +764,15 @@ whether an object is shareable or not: a cross-interpreter way, whether via a proxy, a copy, or some other means. -This proposal provides two ways to do share such objects between +This proposal provides two ways to share such objects between interpreters. -First, shareable objects may be passed to ``run()`` as keyword arguments, -where they are effectively injected into the target interpreter's -``__main__`` module. This is mainly intended for sharing meta-objects -(e.g. channels) between interpreters, as it is less useful to pass other -objects (like ``bytes``) to ``run``. +First, channels may be passed to ``run()`` via the ``channels`` +keyword argument, where they are effectively injected into the target +interpreter's ``__main__`` module. While passing arbitrary shareable +objects this way is possible, doing so is mainly intended for sharing +meta-objects (e.g. channels) between interpreters. It is less useful +to pass other objects (like ``bytes``) to ``run`` directly. Second, the main mechanism for sharing objects (i.e. their data) between interpreters is through channels. A channel is a simplex FIFO similar @@ -778,6 +780,9 @@ to a pipe. The main difference is that channels can be associated with zero or more interpreters on either end. Unlike queues, which are also many-to-many, channels have no buffer. +The ``interpreters`` module provides the following functions and +classes related to channels: + ``create_channel()``:: Create a new channel and return (recv, send), the RecvChannel and @@ -802,24 +807,25 @@ many-to-many, channels have no buffer. ``RecvChannel(id)``:: The receiving end of a channel. An interpreter may use this to - receive objects from another interpreter. At first only bytes will - be supported. + receive objects from another interpreter. At first only a few of + the simple, immutable builtin types will be supported. id: - The channel's unique ID. + The channel's unique ID. This is shared with the "send" end. interpreters: The list of associated interpreters: those that have called - the "recv()" or "__next__()" methods and haven't called - "release()" (and the channel hasn't been explicitly closed). + the "recv()" method and haven't called "release()" (and the + channel hasn't been explicitly closed). recv(): Return the next object (i.e. the data from the sent object) from the channel. If none have been sent then wait until the next - send. This associates the current interpreter with the channel. + send. This associates the current interpreter with the "recv" + end of the channel. If the channel is already closed then raise ChannelClosedError. If the channel isn't closed but the current interpreter already @@ -848,7 +854,7 @@ many-to-many, channels have no buffer. to 0, the channel is actually marked as closed. The Python runtime will garbage collect all closed channels, though it may not be immediately. Note that "release()" is automatically called - in behalf of the current interpreter when the channel is no longer + on behalf of the current interpreter when the channel is no longer used (i.e. has no references) in that interpreter. This operation is idempotent. Return True if "release()" has not @@ -857,21 +863,21 @@ many-to-many, channels have no buffer. close(force=False): Close both ends of the channel (in all interpreters). This means - that any further use of the channel raises ChannelClosedError. If - the channel is not empty then raise ChannelNotEmptyError (if - "force" is False) or discard the remaining objects (if "force" - is True) and close it. + that any further use of the channel anywhere raises + ChannelClosedError. If the channel is not empty then raise + ChannelNotEmptyError (if "force" is False) or discard the + remaining objects (if "force" is True) and close it. ``SendChannel(id)``:: The sending end of a channel. An interpreter may use this to send - objects to another interpreter. At first only bytes will be - supported. + objects to another interpreter. At first only a few of + the simple, immutable builtin types will be supported. id: - The channel's unique ID. + The channel's unique ID. This is shared with the "recv" end. interpreters: @@ -882,8 +888,9 @@ many-to-many, channels have no buffer. Send the object (i.e. its data) to the receiving end of the channel. Wait until the object is received. If the the - object is not shareable then ValueError is raised. Currently - only bytes are supported. + object is not shareable then ValueError is raised. This + associates the current interpreter with the "send" end of the + channel. If the channel is already closed then raise ChannelClosedError. If the channel isn't closed but the current interpreter already @@ -892,9 +899,10 @@ many-to-many, channels have no buffer. send_nowait(obj): - Send the object to the receiving end of the channel. If the other - end is not currently receiving then raise NotReceivedError. - Otherwise this is the same as "send()". + Send the object to the receiving end of the channel. If no + interpreter is currently receiving (waiting on the other end) + then raise NotReceivedError. Otherwise this is the same as + "send()". send_buffer(obj): @@ -918,9 +926,9 @@ many-to-many, channels have no buffer. Close both ends of the channel (in all interpreters). No matter what the "send" end of the channel is immediately closed. If the channel is empty then close the "recv" end immediately too. - Otherwise wait until the channel is empty before closing it (if - "force" is False) or discard the remaining items and close - immediately (if "force" is True). + Otherwise, if "force" if False, close the "recv" end (and hence + the full channel) once the channel becomes empty; or, if "force" + is True, discard the remaining items and close immediately. Note that ``send_buffer()`` is similar to how ``multiprocessing.Connection`` works. [mp-conn]_ @@ -937,6 +945,7 @@ Open Questions Open Implementation Questions ============================= +.. XXX Does every interpreter think that their thread is the "main" thread? -------------------------------------------------------------------- @@ -949,6 +958,7 @@ or not. This presents a problem in cases where "main thread" is meant to imply "main thread in the main interpreter" [main-thread]_, where the main interpreter is the initial one. +.. XXX Disallow subinterpreters in the main thread? -------------------------------------------- @@ -1048,10 +1058,11 @@ Syntactic Support The ``Go`` language provides a concurrency model based on CSP, so it's similar to the concurrency model that subinterpreters support. -``Go`` provides syntactic support, as well several builtin concurrency -primitives, to make concurrency a first-class feature. Conceivably, -similar syntactic (and builtin) support could be added to Python using -subinterpreters. However, that is *way* outside the scope of this PEP! +However, ``Go`` also provides syntactic support, as well several builtin +concurrency primitives, to make concurrency a first-class feature. +Conceivably, similar syntactic (and builtin) support could be added to +Python using subinterpreters. However, that is *way* outside the scope +of this PEP! Multiprocessing --------------- @@ -1072,19 +1083,21 @@ raise an ImportError if unsupported. Alternately we could support opting in to subinterpreter support. However, that would probably exclude many more modules (unnecessarily) -than the opt-out approach. +than the opt-out approach. Also, note that PEP 489 defined that an +extension's use of the PEP's machinery implies support for +subinterpreters. The scope of adding the ModuleDef slot and fixing up the import machinery is non-trivial, but could be worth it. It all depends on -how many extension modules break under subinterpreters. Given the -relatively few cases we know of through mod_wsgi, we can leave this -for later. +how many extension modules break under subinterpreters. Given that +there are relatively few cases we know of through mod_wsgi, we can +leave this for later. Poisoning channels ------------------ CSP has the concept of poisoning a channel. Once a channel has been -poisoned, and ``send()`` or ``recv()`` call on it will raise a special +poisoned, any ``send()`` or ``recv()`` call on it would raise a special exception, effectively ending execution in the interpreter that tried to use the poisoned channel. @@ -1092,6 +1105,7 @@ This could be accomplished by adding a ``poison()`` method to both ends of the channel. The ``close()`` method can be used in this way (mostly), but these semantics are relatively specialized and can wait. +.. XXX Sending channels over channels ------------------------------ @@ -1161,7 +1175,7 @@ Per Antoine Pitrou [async]_:: on (probably a file descriptor?). A possible solution is to provide async implementations of the blocking -channel methods (``__next__()``, ``recv()``, and ``send()``). However, +channel methods (``recv()``, and ``send()``). However, the basic functionality of subinterpreters does not depend on async and can be added later. From b5deb41ffde3dc249faeca233937831508c744b0 Mon Sep 17 00:00:00 2001 From: Eric Snow Date: Fri, 22 Mar 2019 22:44:51 -0600 Subject: [PATCH 2/6] Add channels to the list of shareable objects. --- pep-0554.rst | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/pep-0554.rst b/pep-0554.rst index 4cf07b1b579..1cb6ccd9db8 100644 --- a/pep-0554.rst +++ b/pep-0554.rst @@ -51,6 +51,7 @@ At first only the following types will be supported for sharing: * str * int * PEP 3118 buffer objects (via ``send_buffer()``) +* PEP 554 channels Support for other basic types (e.g. bool, float, Ellipsis) will be added later. @@ -553,6 +554,7 @@ channels to the following: * str * int * PEP 3118 buffer objects (via ``send_buffer()``) +* channels Limiting the initial shareable types is a practical matter, reducing the potential complexity of the initial implementation. There are a @@ -1105,16 +1107,6 @@ This could be accomplished by adding a ``poison()`` method to both ends of the channel. The ``close()`` method can be used in this way (mostly), but these semantics are relatively specialized and can wait. -.. XXX -Sending channels over channels ------------------------------- - -Some advanced usage of subinterpreters could take advantage of the -ability to send channels over channels, in addition to bytes. Given -that channels will already be multi-interpreter safe, supporting then -in ``RecvChannel.recv()`` wouldn't be a big change. However, this can -wait until the basic functionality has been ironed out. - Reseting __main__ ----------------- From 9f189f90d197f857d0ed9b47157ba691279b96dc Mon Sep 17 00:00:00 2001 From: Eric Snow Date: Fri, 22 Mar 2019 22:47:49 -0600 Subject: [PATCH 3/6] Drop some open questions. --- pep-0554.rst | 45 --------------------------------------------- 1 file changed, 45 deletions(-) diff --git a/pep-0554.rst b/pep-0554.rst index 1cb6ccd9db8..2a56a721211 100644 --- a/pep-0554.rst +++ b/pep-0554.rst @@ -939,55 +939,10 @@ Note that ``send_buffer()`` is similar to how Open Questions ============== -* "force" argument to ``ch.release()``? * add a "tp_share" type slot instead of using a global registry for shareable types? -Open Implementation Questions -============================= - -.. XXX -Does every interpreter think that their thread is the "main" thread? --------------------------------------------------------------------- - -(This is more of an implementation detail that an issue for the PEP.) - -CPython's interpreter implementation identifies the OS thread in which -it was started as the "main" thread. The interpreter the has slightly -different behavior depending on if the current thread is the main one -or not. This presents a problem in cases where "main thread" is meant -to imply "main thread in the main interpreter" [main-thread]_, where -the main interpreter is the initial one. - -.. XXX -Disallow subinterpreters in the main thread? --------------------------------------------- - -(This is more of an implementation detail that an issue for the PEP.) - -This is a specific case of the above issue. Currently in CPython, -"we need a main \*thread\* in order to sensibly manage the way signal -handling works across different platforms". [main-thread]_ - -Since signal handlers are part of the interpreter state, running a -subinterpreter in the main thread means that the main interpreter -can no longer properly handle signals (since it's effectively paused). - -Furthermore, running a subinterpreter in the main thread would -conceivably allow setting signal handlers on that interpreter, which -would likewise impact signal handling when that interpreter isn't -running or is running in a different thread. - -Ultimately, running subinterpreters in the main OS thread introduces -complications to the signal handling implementation. So it may make -the most sense to disallow running subinterpreters in the main thread. -Support for it could be considered later. The downside is that folks -wanting to try out subinterpreters would be required to take the extra -step of using threads. This could slow adoption and experimentation, -whereas without the restriction there's less of an obstacle. - - Deferred Functionality ====================== From 0c6ac4c36513588fad52aaaf515251b20f19fc6f Mon Sep 17 00:00:00 2001 From: Eric Snow Date: Fri, 22 Mar 2019 23:50:59 -0600 Subject: [PATCH 4/6] Add an example. --- pep-0554.rst | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/pep-0554.rst b/pep-0554.rst index 2a56a721211..cce86973828 100644 --- a/pep-0554.rst +++ b/pep-0554.rst @@ -243,6 +243,24 @@ Handling an exception except interpreters.RunFailedError as exc: print(f"got the error from the subinterpreter: {exc}") +Re-raising an exception +----------------------- + +:: + + interp = interpreters.create() + try: + try: + interp.run(tw.dedent(""" + raise KeyError + """)) + except interpreters.RunFailedError as exc: + raise exc.__cause__ + except KeyError: + print("got a KeyError from the subinterpreter") + +Note that this pattern is a candidate for later improvement. + Synchronize using a channel --------------------------- From 83dece41edbcf8207ca0b6ae9b645dd44d8e065a Mon Sep 17 00:00:00 2001 From: Eric Snow Date: Fri, 22 Mar 2019 23:51:31 -0600 Subject: [PATCH 5/6] clarify --- pep-0554.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pep-0554.rst b/pep-0554.rst index cce86973828..e3d41f115d3 100644 --- a/pep-0554.rst +++ b/pep-0554.rst @@ -758,9 +758,9 @@ interpreters, we create a surrogate of the exception and its traceback (see ``traceback.TracebackException``), set it to ``__cause__`` on a new ``RunFailedError``, and raise that. -Raising (a proxy of) the exception is problematic since it's harder to -distinguish between an error in the ``run()`` call and an uncaught -exception from the subinterpreter. +Raising (a proxy of) the exception directly is problematic since it's +harder to distinguish between an error in the ``run()`` call and an +uncaught exception from the subinterpreter. API for sharing data From b1f4495f0e784d085e095e20d695d588f4ed6d69 Mon Sep 17 00:00:00 2001 From: Eric Snow Date: Fri, 22 Mar 2019 23:51:55 -0600 Subject: [PATCH 6/6] Add an "Implementation" section. --- pep-0554.rst | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/pep-0554.rst b/pep-0554.rst index e3d41f115d3..95f1d8effb3 100644 --- a/pep-0554.rst +++ b/pep-0554.rst @@ -1299,6 +1299,39 @@ Rejected possible solutions: to do something similar +Implementation +============== + +The implementation of the PEP has 4 parts: + +* the high-level module described in this PEP (mostly a light wrapper + around a low-level C extension +* the low-level C extension module +* additions to the ("private") C=API needed by the low-level module +* secondary fixes/changes in the CPython runtime that facilitate + the low-level module (among other benefits) + +These are at various levels of completion, with more done the lower +you go: + +* the high-level module has been, at best, roughly implemented. + However, fully implementing it will be almost trivial. +* the low-level module is mostly complete. The bulk of the + implementation was merged into master in December 2018 as the + "_xxsubinterpreters" module (for the sake of testing subinterpreter + functionality). Only 3 parts of the implementation remain: + "send_wait()", "send_buffer()", and exception propagation. All three + have been mostly finished, but were blocked by work related to ceval. + That blocker is basically resolved now and finishing the low-level + will not require extensive work. +* all necessary C-API work has been finished +* all anticipated work in the runtime has been finished + +The implementation effort for PEP 554 is being tracked as part of +a larger project aimed at improving multi-core support in CPython. +[multi-core-project]_ + + References ========== @@ -1368,6 +1401,9 @@ References .. [pypy] https://mail.python.org/pipermail/python-ideas/2017-September/046973.html +.. [multi-core-project] + https://github.com/ericsnowcurrently/multi-core-python + Copyright =========