10000 `java.lang.process.UnixProcessGen2`'s `isAlive` has a race-condition on child process death · Issue #4208 · scala-native/scala-native · GitHub
[go: up one dir, main page]

Skip to content
java.lang.process.UnixProcessGen2's isAlive has a race-condition on child process death #4208
@j-mie6

Description

@j-mie6

The isAlive method calls waitpidImpl, which, as per the POSIX spec, will return the ECHILD error if the child process has already terminated (or does not exist). The Scala native implementation of waitpidImpl will throw an IOException in this instance, treating this as an exceptional event. However, isAlive calls waitpidImpl to be able to wait for child termination: if the child dies before isAlive has a chance to wait on it, this will result in the IOException being thrown instead of false being returned. This is specifically an issue when options = WNOHANG is specified (as this allows the child process to terminate without becoming a zombie)

On the JVM, isAlive doesn't throw an exception if the child has already died: in a test-suite with over a thousand runs of a ProcessBuilder, we see that the JVM always passes all tests, but Scala Native runs of the same suite fail tests with this IOException thrown around 5% or more of the time.

@LeeTibbert and I discussed this on the Scala Discord server. Lee made the point that since POSIX's API characterises this as an error, it would be innappropriate to alter the definition of waitpidImpl to swallow ECHILD, since this might affect people doing lower-level C-style operations. Instead, we came up with the following solutions:

  1. isAlive will use a try-catch around waitpidImpl and catch the IOException, which it can then return false with. This has the disadvantage that other legitimate IOExceptions may be suppressed
  2. waitpidImpl is given an additional parameter suppressECHILD which explicitly determines if it handles the ECHILD case or not.

Solution (2) is better, since this is a private method anyway. There are three call sites:

  • isAlive: set flag to true
  • exitValue: calls with WNOHANG, but this means that there could be an exception thrown here as well. It's unclear how this should proceed
  • askZombiesForTheirExitStatus: this does not WNOHANG, so should be set to false (ECHILD here is a legitimate bug)

When we've settled on the behaviour of exitValue I can make a PR to address this

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0