-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[Bug]: PGF backend crashes at program exit after creating a plot #27437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This appears to be a consequence of python/cpython@ce558e6 (disabling creation of new threads at shutdown) combined with the fact that on Windows, Popen._communicate is implemented by running proc.stdout.read() in a new thread with a timeout (POSIX uses selectors and thus doesn't have this problem). I suppose the "easy" way to fix that (again, only for Windows) is to effectively inline Popen._communicate implementation for finalize_latex (dropping out the unneeded parts as we don't actually care about the outputs at that point, nor about the timeouts), but making sure to start the reader threads much earlier (in _setup_latex_process) and letting them just wait until signalled (e.g. via a threading.Event) by finalize_latex to start reading. Edit: perhaps diff --git i/lib/matplotlib/backends/backend_pgf.py w/lib/matplotlib/backends/backend_pgf.py
index 78d15fe29b..29f04f8577 100644
--- i/lib/matplotlib/backends/backend_pgf.py
+++ w/lib/matplotlib/backends/backend_pgf.py
@@ -257,7 +257,7 @@ class LatexManager:
# Open LaTeX process for real work; register it for deletion. On
# Windows, we must ensure that the subprocess has quit before being
# able to delete the tmpdir in which it runs; in order to do so, we
- # must first `kill()` it, and then `communicate()` with it.
+ # must first `kill()` it, and then `wait()` with it.
try:
self.latex = subprocess.Popen(
[mpl.rcParams["pgf.texsystem"], "-halt-on-error"],
@@ -274,7 +274,7 @@ class LatexManager:
def finalize_latex(latex):
latex.kill()
- latex.communicate()
+ latex.wait()
self._finalize_latex = weakref.finalize(
self, finalize_latex, self.latex) is enough in practice? |
Even though the pgf file is correctly created I think the crash is annoying enough (and may leave temporary files lying around perhaps?) to mark this as RC, especially as there's a tentative patch. (A possible issue with using wait() is mentioned at https://docs.python.org/3/library/subprocess.html#subprocess.Popen.wait re: pipes filling up, but I think this may not be a possible issue here?) |
Is it worth opening an issue with upstream as this may not have been understand to be a sideeffect of that change? |
I agree it's a bit annoying but I'm not really sure what they could do to fix that in the general case. One specific thing that could be done, though, is that right now they skip handling stdin/stdout/stderr in separate threads if at least two of them are None and there is no timeout (in that case they can just handle the remaining fd (if any) in the main thread) -- see the definition of Popen.communicate --; however, they could also do so if |
I'll try to do this next Tuesday if someone does not beat me to it. |
I did not get to opening an issue upstream yet. |
A concern with switching from https://docs.python.org/3/library/subprocess.html#subprocess.Popen.wait
but I think in this case we should always be "small" while tearing down a the latex process. Can we do diff --git a/lib/matplotlib/backends/backend_pgf.py b/lib/matplotlib/backends/backend_pgf.py
index 78d15fe29b..ba01622bd0 100644
--- a/lib/matplotlib/backends/backend_pgf.py
+++ b/lib/matplotlib/backends/backend_pgf.py
@@ -257,7 +257,8 @@ class LatexManager:
# Open LaTeX process for real work; register it for deletion. On
# Windows, we must ensure that the subprocess has quit before being
# able to delete the tmpdir in which it runs; in order to do so, we
- # must first `kill()` it, and then `communicate()` with it.
+ # must first `kill()` it, and then `communicate()` with or `wait()` on
+ # it.
try:
self.latex = subprocess.Popen(
[mpl.rcParams["pgf.texsystem"], "-halt-on-error"],
@@ -274,7 +275,10 @@ class LatexManager:
def finalize_latex(latex):
latex.kill()
- latex.communicate()
+ try:
+ latex.communicate()
+ except RuntimeError:
+ latex.wait()
self._finalize_latex = weakref.finalize(
self, finalize_latex, self.latex) |
Indeed, the deadlock seems improbable in this case; I guess the try... except approach is fine too. |
Does somebody with a windows dev machine have the ability to test the proposed workaround? Or are we convinced the ultimate solution is upstream and we should not be implementing the workaround here? This is marked as |
I think we can (blindly) go with the |
On windows on py312 we can not use `.communicate` during process shutdown because it internally uses Python threads which can no longer be created during interpreter shutdown. Closes matplotlib#27437
On windows on py312 we can not use `.communicate` during process shutdown because it internally uses Python threads which can no longer be created during interpreter shutdown. Closes matplotlib#27437
On windows on py312 we can not use `.communicate` during process shutdown because it internally uses Python threads which can no longer be created during interpreter shutdown. Closes matplotlib#27437
Bug summary
After successfully creating a plot using the PGF backend, Python will crash on exit while running cleanup code.
Code for reproduction
Actual outcome
Two exceptions are produced at exit. The first is:
The second, which will not be produced if time elapses between plot creation and program exit (read: a sleep(0.5) call), is:
Expected outcome
No crashes.
Additional information
Workaround that prevents both exceptions is running
weakref.finalize._exitfunc()
before program exit.Workaround that prevents the second exception only is letting time elapse before program exit (like with a
time.sleep(0.5)
call).Operating system
Windows 11 Enterprise 22H2
Matplotlib Version
3.8.2
Matplotlib Backend
pgf
Python version
3.12.0
Jupyter version
No response
Installation
pip
The text was updated successfully, but these errors were encountered: