8000 multiprocessing.Process.is_alive() incorrect if waitpid() was executed concurrently to it · Issue #92881 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

multiprocessing.Process.is_alive() incorrect if waitpid() was executed concurrently to it #92881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ElectronicRU opened this issue May 17, 2022 · 3 comments
Labels
topic-multiprocessing type-bug An unexpected behavior, bug, or error

Comments

@ElectronicRU
Copy link

Bug report
As per title. In my case I triggered the os.waitpid() using psutil library, but easily reproducible using os.waitpid, too.

Reproduction:

Python 3.9.5 (default, Nov 18 2021, 16:00:48) 
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing import Process
>>> import os
>>> import time
>>> p = Process(target=time.sleep, args=(10,))
>>> p.start(); os.waitpid(p.pid, 0)
(17587, 0)
>>> p.is_alive()
True
>>> p.join() # Returns immediately
>>> 

I understand it's something of a corner case; however, join() behaves correctly here, so we definitely have enough info about whether the process is alive.

By the looks of it, if the process has been already waited on, poll() Popen method in multiprocessing returns None, which is interpreted as the process being still alive.

Your environment

  • CPython versions tested on: 3.9.5, but main version seems affected
  • Operating system and architecture: x64 Ubuntu Linux
@ElectronicRU ElectronicRU added the type-bug An unexpected behavior, bug, or error label May 17, 2022
@gaogaotiantian
Copy link
Member

The reason of the bug is that in the poll() function, it did

            try:
                pid, sts = os.waitpid(self.pid, flag)
            except OSError:
                # Child process not yet created. See #1731717
                # e.errno == errno.ECHILD == 10
                return None

However, when an OSError is raised, there are two possibilities:

  • The process is not created yet (as in the comment)
  • The process is done and collected (in this case)

There's no way in that specific function to tell them apart without adding some extra state variables.

Actually, it's more than just poll(). Because the process is never properly collected by python in this case, there are other side effects. For example, multiprocessing.active_children() will return the wrong data. Also Process.exitcode will never be set.

I can't think of a simple way to solve this issue, but this should at least be documented somewhere.

@Vladyyy
Copy link
Vladyyy commented Jan 23, 2025

I'm curious if folks have context on why the multiprocessing Process implementation is not thread safe ? subprocess was made thread safe a while back.

I solved the problem for my usecase with the following snippet to synchronize the poll command. This doesn't address the global _children list being shared though

class ThreadSafePopen(Popen):
    def __init__(self, *args, **kwargs):
       self._poll_lock = threading.Lock()
       super().__init__(*args, **kwargs)

    def poll(self, *args, **kwargs):
        """
        Thread-safe version of the poll method.
        """
        with self._poll_lock:
            return super().poll(*args, **kwargs)


class _Process(SpawnProcess):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def _Popen(self, *args, **kwargs):
        """
        Use the ThreadSafePopen implementation for thread-safe operations.
        """
        return ThreadSafePopen(*args, **kwargs)

@duaneg
Copy link
Contributor
duaneg commented Apr 27, 2025

Using os.waitpid acts on the underlying OS primitive and bypasses the higher-level logic in multiprocessing.Process. In particular there is no way for the process to get its child's return code once this happens. This will inevitably cause problems, just like using os.close on a file descriptor would for a file object wrapping it.

I think this issue should just be closed.

A documentation update might be worthwhile, but even that is tricky. There are many ways the user could interfere and it isn't obvious what to say or where the note should go. Just doing an os.wait() will cause the same problem if using the fork method (and will hang indefinitely with others).

See also #130895, which is a similar issue caused by a race condition. Although since it all goes through the high-level interface it can be fixed, at least in principle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-multiprocessing type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

5 participants
0