8000 gh-88887: Cleanup `multiprocessing.resource_tracker.ResourceTracker` upon deletion by luccabb · Pull Request #130429 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-88887: Cleanup multiprocessing.resource_tracker.ResourceTracker upon deletion #130429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Mar 20, 2025

Conversation

luccabb
Copy link
Contributor
@luccabb luccabb commented Feb 21, 2025

fixing #88887: multiprocessing: the Resource Tracker process is never reaped

#88887 is only visible when running Python with PID 1. This is because Python relies on kernel's process handling code to clean up terminated processes, which works fine for most cases. But it won't work when running Python as PID 1 and there's no signal processing handler to reap zombies on client code

So to test changes we need a docker instance that starts with python code that leaks the process. For that I'm using https://github.com/viktorvia/python-multi-issue but with a custom Dockerfile that installs my branch of cpython instead: https://gist.github.com/luccabb/a0a48b15cd30746e7619ecb7168ce439

Test plan

before

docker build

(base) luccab@luccab-mbp python-multi-issue % docker build -t multi

observe [python3] <defunct> processes increasing by 1 every 1s:

(base) luccab@luccab-mbp python-multi-issue % docker run -it multi python3 main.py
[1, 4, 9]

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 23:00 pts/0    00:00:00 python3 main.py
root         8     1  0 23:00 pts/0    00:00:00 [python3] <defunct>
root        17     1  0 23:00 pts/0    00:00:00 ps -ef --forest

[1, 4, 9]

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  3 23:00 pts/0    00:00:00 python3 main.py
root         8     1  2 23:00 pts/0    00:00:00 [python3] <defunct>
root        19     1  0 23:00 pts/0    00:00:00 [python3] <defunct>
root        28     1  0 23:00 pts/0    00:00:00 ps -ef --forest

[1, 4, 9]

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  1 23:00 pts/0    00:00:00 python3 main.py
root         8     1  1 23:00 pts/0    00:00:00 [python3] <defunct>
root        19     1  2 23:00 pts/0    00:00:00 [python3] <defunct>
root        30     1  0 23:00 pts/0    00:00:00 [python3] <defunct>
root        39     1  0 23:00 pts/0    00:00:00 ps -ef --forest

after

update Dockerfile to get code from this branch: https://gist.github.com/luccabb/a0a48b15cd30746e7619ecb7168ce439

docker build

(base) luccab@luccab-mbp python-multi-issue % docker build -t multi .

observe no [python3] <defunct> processes:

(base) luccab@luccab-mbp python-multi-issue % docker run -it multi python3 main.py
[1, 4, 9]

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  4 23:02 pts/0    00:00:00 python3 main.py
root        17     1  0 23:02 pts/0    00:00:00 ps -ef --forest

[1, 4, 9]

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  2 23:02 pts/0    00:00:00 python3 main.py
root        28     1  0 23:02 pts/0    00:00:00 ps -ef --forest

[1, 4, 9]

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  1 23:02 pts/0    00:00:00 python3 main.py
root        39     1  0 23:02 pts/0    00:00:00 ps -ef --forest

@luccabb luccabb requested a review from gpshead as a code owner February 21, 2025 23:06
@ghost
Copy link
ghost commented Feb 21, 2025

All commit authors signed the Contributor License Agreement.
CLA signed

# see https://github.com/python/cpython/issues/88887
try:
self._stop()
except:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching some exceptions is fine. But catching any exception is not ok. You should not ignore KeyboardInterrupt for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to catch OSError, TypeError, AttributeError only

@picnixz picnixz changed the title gh-88887: Fix resource tracker leak gh-88887: Cleanup multiprocessing.resource_tracker.ResourceTracker upon deletion Feb 24, 2025
Co-authored-by: Victor Stinner <vstinner@python.org>
@luccabb luccabb requested a review from vstinner February 25, 2025 01:34
@luccabb
Copy link
Contributor Author
luccabb commented Mar 3, 2025

@vstinner good to merge?

# see https://github.com/python/cpython/issues/88887
try:
self._stop()
except (OSError, TypeError, AttributeError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not confortable with these exceptions. In which case we should expect these exceptions? I would prefer a direct call to self._stop() without try/except.

_stop() should be modified to do nothing if self._pid is None.

    def _stop(self):
        with self._lock:
            if self._pid is None:
                return
            ...

Copy link
Contributor Author
@luccabb luccabb Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to catching AttributeError only which is likely caused by __del__:

del() can be executed during interpreter shutdown. As a consequence, the global variables it needs to access (including other modules) may already have been deleted or set to None.

https://docs.python.org/3/reference/datamodel.html#object.del

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not catching AttributeError here fails some of the tests as the os module is already None when _stop runs

@luccabb luccabb requested a review from vstinner March 7, 2025 00:15
@luccabb
Copy link
Contributor Author
luccabb commented Mar 18, 2025

@gpshead @vstinner

Copy link
Member
@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change mostly LGTM, but here is a review on minor issues.

@@ -0,0 +1 @@
Fixing multiprocessing Resource Tracker process leaking
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Fixing multiprocessing Resource Tracker process leaking
Fixing multiprocessing Resource Tracker process leaking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the issue specific to Python running as the pid 1? If yes, it might be useful to mention it in the Changelog entry.

Copy link
Contributor Author
@luccabb luccabb Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the leaking happens regardless of PID 1. Python relies on kernel's process handling code to clean up the process, which works fine for most cases.

A case where it won't work is when running Python as PID 1 and there's no signal processing handler to reap zombies on client code

@luccabb luccabb requested a review from vstinner March 19, 2025 18:32
Copy link
Member
@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vstinner vstinner merged commit f53e7de into python:main Mar 20, 2025
43 checks passed
@vstinner vstinner added needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes labels Mar 20, 2025
@miss-islington-app
Copy link

Thanks @luccabb for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Thanks @luccabb for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry, @luccabb and @vstinner, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker f53e7de6a84a0f535efb75c3671283b801a1af0f 3.12

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Mar 20, 2025
…cker` upon deletion (pythonGH-130429)

(cherry picked from commit f53e7de)

Co-authored-by: luccabb <32229669+luccabb@users.noreply.github.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
@bedevere-app
Copy link
bedevere-app bot commented Mar 20, 2025

GH-131516 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Mar 20, 2025
@vstinner
Copy link
Member

Merged, thank you.

@luccabb: Automated backport to 3.12 failed, do you want to try to backport the change manually to 3.12?

vstinner added a commit that referenced this pull request Mar 20, 2025
…acker` upon deletion (GH-130429) (#131516)

gh-88887: Cleanup `multiprocessing.resource_tracker.ResourceTracker` upon deletion (GH-130429)
(cherry picked from commit f53e7de)

Co-authored-by: luccabb <32229669+luccabb@users.noreply.github.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
@luccabb
Copy link
Contributor Author
luccabb commented Mar 20, 2025

@vstinner: manually backporting changes to 3.12 on #131530

@bedevere-app
Copy link
bedevere-app bot commented Mar 21, 2025

GH-131530 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 only security fixes label Mar 21, 2025
vstinner added a commit that referenced this pull request Mar 21, 2025
…acker` upon deletion (GH-130429) (#131530)

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
(cherry picked from commit f53e7de)
@luccabb luccabb deleted the fix_resource_tracker_leak branch March 21, 2025 16:21
@tomMoral
Copy link
Contributor

Hello,

We noticed this change in the logs of joblib.
I did not have time to investigate in great details but from my understanding, this PR might break the intended purpose of the resource_tracker in case of nested parallelism.

The resource_tracker is shared by the main process and all its children, and should be cleaned up only once all processes that share this resource tracker are destroyed. In case of nested parallelism, the same resource tracker is linked to all child processes. The goal of this tracker is to be a safe guard: it should be the last one standing. For this, the original design relied on the fact that a "read" pipe is not readable once the file descriptor of the "write" pipe is destroyed. But with the new design, I think the resource tracker will be called as soon as one child finishes cleanly, even if all other processes are still alive no?

I think in normal behavior, it should probably only be called by the main process no? Sorry if this is just because I misunderstood the PR by reading too fast.

@gpshead
Copy link
Member
gpshead commented Apr 14, 2025

Thanks, please open a new issue with a test scenario and reproduction details.

@tomMoral
Copy link
Contributor

Sorry for the noise, I read too fast... The behavior is still the right one.

seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
…cker` upon deletion (python#130429)

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0