-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
PEP 788: Address feedback from first discussion round #4400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@godlygeek I can't add you as a reviewer, but I'd appreciate your opinion on some of the new things I'm proposing here. |
Bump @AA-Turner @vstinner now that the beta is out. There's no rush considering we have a year until the 3.15 freeze, though. |
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Victor Stinner <vstinner@python.org>
… into pep-788-round-1
I'm not comfortable with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks, the API looks better to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does PyThreadState_Ensure()
create a new thread state instead of reusing an existing one?
deallocated, and shutdown for the main interpreter includes the entire Python | ||
runtime being finalized. | ||
|
||
Native and Python Threads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Native threads" is not a good term for non-threading module created threads even if you (re)define the term. It perpetuates a confusion that Python threads aren't "real" OS threads.
We use "non-Python created threads" in, e.g., https://docs.python.org/3/c-api/init.html#non-python-created-threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't totally disagree, but I'd really prefer not using the clunky phrase "non-Python created thread", especially considering how many times "native thread" is used throughout the PEP. Is there a shorter term we could use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OS threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "OS thread" has the same problem as "native thread". I'm personally fine with still using the term "native thread", because this PEP will change how threading
works too. So you get two interpretations, both being correct:
- "Reimagining Threads natively": The PEP changes how to use OS threads with a native caller.
- "Reimagining Native Threads": The PEP changes how native OS threads,
threading
included, interact with the interpreter.
like this: | ||
|
||
.. code-block:: c | ||
In Python, threads are able to interact with an interpreter (e.g., invoke the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this paragraph is helpful here. Get to the crux: PyGILState_Ensure
/PyGILState_Release
is one of the primary ways to get a valid thread state using the C API and it doesn't work well with subinterpreters because it always creates (or reuses) a thread state for the main interpreter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's useful as a quick intro to people who aren't deeply familiar with how thread states work. I would expect that 99% of extension developers just know the PyGILState
surface API, so the thread state terminology that the proposal uses might seem foreign. Would you prefer I just link to the documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think talking about terminology is okay, but it doesn't need to be at the beginning of the abstract. When reading the abstract, I'd want to see:
- The most important problem the PEP is solving
- An outline of the solution without going in to too much detail
called. There's a subtle difference between the two terms, as used in this | ||
PEP: | ||
|
||
- "Finalization" refers to an interpreter getting ready to "shut down", in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the distinction being made here between finalization and shutdown nor the purpose of making this distinction -- I don't see how "interpreter finalization" and "interpreter shutdown" are different things.
"Finalization" is also pretty overloaded; you can say "interpreter finalization" to be more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not important for the main interpreter, but there's an important distinction for subinterpreters because the PyInterpreterState
structure becomes invalid after shutdown, but not during finalization. That's an important detail for the rationale on why we're not using PyInterpreterState *
for PyThreadState_Ensure
.
But I think you're right, anyway. I'll remove this.
On free-threaded builds, lock-ordering deadlocks are still possible | ||
if thread A acquired the lock for object A and then object B, and then | ||
another thread tried to acquire those locks in the reverse order. Free-threading | ||
currently protects against this by releasing locks when the thread state is | ||
detached, making detachment a necessity to prevent deadlocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is mixing up a few ideas (critical sections and stop the world). You want Py_BEGIN_ALLOW_THREADS
around blocking operations to prevent deadlock with GC (or other stop-the-world) events.
Stop-the-world events are analogous to holding the GIL -- one thread has exclusive access to the interpreter -- so you can end up with the same deadlocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, is the current thing I wrote wrong? I think this paragraph is quite long already, and it's only here to quickly explain why Py_BEGIN_ALLOW_THREADS
is still needed on free-threaded builds. If there's nothing wrong with what's already there, it's probably fine to leave it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's getting long already. Was this expanded in response to feedback on the PEP? I would just have one sentence regarding the free threaded build at the end of the lock ordering deadlock paragraph. Something like:
...while thread B holds the GIL and is waiting on the lock. A similar deadlock can occur in the free threaded build during stop the world pauses, which happen during during garbage collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, is the current thing I wrote wrong?
The first part is definitely true: "on free-threaded builds, lock-ordering deadlocks are still possible".
The subsequent explanation doesn't make much sense to me. The "releasing locks when the thread state is
detached" behavior is only applicable to critical sections, and we don't use Py_BEGIN_ALLOW_THREADS
directly with critical section APIs. The critical section API implementations do the detaching/reattaching internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this expanded in response to feedback on the PEP? I would just have one sentence regarding the free threaded build at the end of the lock ordering deadlock paragraph.
Ok, yeah, I'll do it this way.
The critical section API implementations do the detaching/reattaching internally.
I was thinking of something like this:
Py_BEGIN_CRITICAL_SECTION(whatever);
acquire_os_lock(); // NOT PyMutex
Py_END_CRITICAL_SECTION();
You still need to have Py_BEGIN_ALLOW_THREADS
to release that critical section. Otherwise you can get lock-ordering deadlocks, right?
Daemon threads can cause finalization deadlocks | ||
*********************************************** | ||
Daemon Threads Can Deadlock Finalization | ||
**************************************** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scenario here that leads to deadlock here happens when you acquire a lock during an object destructor/finalizer thats also acquired elsewhere. But if you're doing that, non-daemon threads won't save you because you will still occasionally deadlock with the GC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I wasn't aware of this. That sounds like it compromises some of the motivation behind the PEP. Could you go into a little more detail here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python's GC runs object finalizers (i.e., __del__
methods) in the same thread that triggered the GC, so you can end trying to reacquire a lock you already hold.
Here's an example from stackoverflow: https://stackoverflow.com/questions/18774401/self-deadlock-due-to-garbage-collector-in-single-threaded-code
Java runs finalizers in their own thread to avoid this sort of problem.
Generally, the "fix" in Python is to avoid acquiring locks in __del__
functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that makes sense, I wasn't taking the garbage collector into account. I was imagining some C code that acquired a lock and got hung before calling into Python. But I think you're right that PyThreadState_Ensure
isn't sufficient to fix tp_finalize
/__del__
deadlocks. That's not good.
Java runs finalizers in their own thread to avoid this sort of problem.
Thinking out loud--it's not out of the question to take a similar approach for Python, at least for non-daemon threads. I'm not too sure what gc.garbage
does in modern versions, but in theory, we could put objects collected under a non-daemon thread state into a garbage list, and then finalize them all in a dedicated thread.
peps/pep-0788.rst
Outdated
no closure, it's not possible for the caller to pass any objects or | ||
interpreter-specific data, so it's completely safe to choose the main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how "it's completely safe to choose the main interpreter" follows from "the callback has no closure".
Maybe these use cases (like readline
) only make sense in the main interpreter, but choosing the correct interpreter seems important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't have a closure argument/callback parameter or some other workaround, then you can't pass interpreter-specific data, so you're good to assume that the main interpreter is the one you want. It might not be the right choice, but by "totally safe", I mean that things will just raise a completely safe exception (e.g. an AttributeError
), rather than crashing through using an object from a different interpreter.
So basically, if you can access state from the caller, then you can also store the interpreter reference, and if not, then there's no way to access invalid cross-interpreter state anyway.
The same places where |
Yes, I think it's worth specifying. The constraint "when the current interpreter doesn't match the requested interpreter" makes me think that you can end up with more threads states per interpreter+OS thread. |
Here's what I was envisioning: // No tstate ever in this thread
// current: NULL
PyThreadState_Ensure(main_interp); // New thread state for the main interpreter
// current: main
PyThreadState_Ensure(main_interp); // Do nothing
// current: main
Py_BEGIN_ALLOW_THREADS; // Detach it
// current: NULL
PyThreadState_Ensure(main_interp); // Reattach the old thread state
// current: main
PyThreadState_Release(); // Detach it again
// current: NULL
Py_END_ALLOW_THREADS; // Reattach
// current: main
PyThreadState_Ensure(subinterp); // New thread state for the subinterpreter
// current: subinterp
PyThreadState_Ensure(main_interp); // New thread state for the main interpreter
// current: main Basically, |
cc @pablogsal: the PEP update I told you about. You might want to have a look. |
PEP 123: Summary of changes
)The key changes I made here are:
PyInterpreterRef
andPyInterpreterWeakRef *
).PyThreadState_GetDaemon()
and what needs to happen tothreading
for that to work.PyInterpreterState *
for refs).PyThreadState_Ensure
stealing a strong reference.📚 Documentation preview 📚: https://pep-previews--4400.org.readthedocs.build/pep-0788/