8000 bpo-44336: Prevent hanging on child process handles by jkloth · Pull Request #26578 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

bpo-44336: Prevent hanging on child process handles #26578

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 22, 2022

Conversation

jkloth
Copy link
Contributor
@jkloth jkloth commented Jun 7, 2021

Replace the child process typeperf.exe with a daemon thread
that reads the performance counters directly. This prevents
the issues that arise from inherited handles in grandchild processes
(see https://bugs.python.org/issue37531#msg350191).

https://bugs.python.org/issue44336

Replace the child process `typeperf.exe` with a daemon thread
that reads the performance counters directly.  This prevents
the issues that arise from inherited handles in grandchild processes
(see https://bugs.python.org/issue37531#msg350191).
@jkloth
Copy link
Contributor Author
jkloth commented Jun 7, 2021

Please add the test-with-buildbots label to ensure this actually fixes the buildbots!

@isidentical
Copy link
Member

CC: @pablogsal

@zware zware added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jun 7, 2021
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @zware for commit 96768d9 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jun 7, 2021
@zware
Copy link
Member
zware commented Jun 7, 2021

Do note that it's currently failing on Windows CI, but I went ahead and added the label.

@jkloth
Copy link
Contributor Author
jkloth commented Jun 7, 2021

It seems that buildbot runs were not sheduled for the Windows buildbots. :( Is it because they are paused (due to excessive exceptions?) or that the CI failed?

@zware
Copy link
Member
zware commented Jun 7, 2021

I think it's just backlog, though I notice that your worker is currently "paused" (though I don't know why).

@jkloth
Copy link
Contributor Author
jkloth commented Jun 7, 2021

Woohoo! The PR allows the buildbot test stage to finish successfully! Well, in this case, with an error, but it no longer hangs waiting for the buildbot controller process to kill it.

@jkloth
Copy link
Contributor Author
jkloth commented Jun 7, 2021

Follow up for those not watching the buildbots. Before this PR:
https://buildbot.python.org/all/#/builders/593/builds/58
and after:
https://buildbot.python.org/all/#/builders/593/builds/59

@jkloth
Copy link
Contributor Author
jkloth commented Jun 7, 2021

Likewise for the other Windows 10 buildbot (bolen-windows10), before:
https://buildbot.python.org/all/#/builders/577/builds/54
and after:
https://buildbot.python.org/all/#/builders/577/builds/55

@pablogsal pablogsal requested a review from vstinner June 7, 2021 21:03
@pablogsal
Copy link
Member

Seems that there are some test failures in the CI in Windows? @jkloth could you rebase this PR against master, rerun the CI and maybe take a look? Thanks!

Also it would be great if someone with Windows experience could take a quick look at this. Ideally @zooba if is available :)

@jkloth
Copy link
Contributor Author
jkloth commented Jun 8, 2021

Seems that there are some test failures in the CI in Windows?

Yes, this is expected due to the current goings on with bpo-11105. The tests are aborting, if you scroll back to their initial test. The whole point of this PR is to see these failures from the buildbots instead of being masked by timeout exceptions from the builder process.

Note that I cannot reproduce any aborts locally when using a release build, however I am running Windows build 19043 (21H1) vs 17763 on the CI. The OS version differences have been known to influence test results.

@jkloth
Copy link
Contributor Author
jkloth commented Jun 8, 2021

There is one problem with the approach in the PR, test__xxsubinterpreters (specifically the is_running() function) cannot be run sequentially, that is, in the same process as the test runner. It fails when more than one thread state is active. It would be possible to skip this particular test case when not run in sub-process mode (-jN) with some additional plumbing.

If that approach is deemed worthy, I'll update the PR to include that as well. I really would like to see the failures on the Windows buildbots show prominently again, instead of being masked behind builder timeout exceptions.

jkloth added 2 commits June 9, 2021 19:09
This prevents inadvertent interactions with tests expecting a single
threaded environment.  Displaying load is really only hel
8000
pful for
buildbots running in multiprocess mode anyway.
@jkloth
Copy link
Contributor Author
jkloth commented Jun 10, 2021

I've addressed the conflicts with test__xxsubinterpreters in the latest commit. In short, only use load for multiprocess test runs.

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Jul 11, 2021
@jkloth
Copy link
Contributor Author
jkloth commented Feb 18, 2022

@eryksun
Copy link
Contributor
eryksun commented Mar 21, 2022

This would be less fragile and more useful in general if support were added to _winapi for PdhOpenQueryW(), PdhAddEnglishCounterW(), PdhCollectQueryData(), PdhCollectQueryDataEx(), PdhGetFormattedCounterValue(), and PdhCloseQuery(). The English counter name is "\System\Processor Queue Length" -- no need to worry about localized names, hard-coded values, or hard-coded struct sizes and offsets.

@zooba zooba added skip news needs backport to 3.9 only security fixes needs backport to 3.10 only security fixes and removed stale Stale PR or inactive for long period of time. labels Mar 21, 2022
@zooba zooba added the tests Tests in the Lib/test dir label Mar 21, 2022
@zooba zooba closed this Mar 21, 2022
@zooba zooba reopened this Mar 21, 2022
@zooba zooba merged commit 19058b9 into python:main Mar 22, 2022
@miss-islington
Copy link
Contributor

Thanks @jkloth for the PR, and @zooba for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9, 3.10.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry @jkloth and @zooba, I had trouble checking out the 3.10 backport branch.
Please backport using cherry_picker on command line.
cherry_picker 19058b9f6271338bcc46b7d30fe79a83990cc35c 3.10

@miss-islington
Copy link
Contributor

Sorry, @jkloth and @zooba, I could not cleanly backport this to 3.9 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 19058b9f6271338bcc46b7d30fe79a83990cc35c 3.9

@zooba zooba added needs backport to 3.10 only security fixes and removed needs backport to 3.10 only security fixes labels Mar 22, 2022
@miss-islington
Copy link
Contributor

Thanks @jkloth for the PR, and @zooba for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry @jkloth and @zooba, I had trouble checking out the 3.10 backport branch.
Please backport using cherry_picker on command line.
cherry_picker 19058b9f6271338bcc46b7d30fe79a83990cc35c 3.10

@bedevere-bot
Copy link

GH-32048 is a backport of this pull request to the 3.10 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.10 only security fixes label Mar 22, 2022
jkloth added a commit to jkloth/cpython that referenced this pull request Mar 22, 2022
…indows (pythonGH-26578)

Replace the child process `typeperf.exe` with a daemon thread that reads the performance counters directly.  This prevents the issues that arise from inherited handles in grandchild processes (see issue37531 for discussion).

We only use the load tracker when running tests in multiprocess mode. This prevents inadvertent interactions with tests expecting a single threaded environment.  Displaying load is really only helpful for buildbots running in multiprocess mode anyway.
(cherry picked from commit 19058b9)

Co-authored-by: Jeremy Kloth <jeremy.kloth@gmail.com>
@bedevere-bot
Copy link

GH-32050 is a backport of this pull request to the 3.9 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.9 only security fixes label Mar 22, 2022
jkloth added a commit to jkloth/cpython that referenced this pull request Mar 22, 2022
…ndows (pythonGH-26578)

Replace the child process `typeperf.exe` with a daemon thread that reads the performance counters directly.  This prevents the issues that arise from inherited handles in grandchild processes (see issue37531 for discussion).

We only use the load tracker when running tests in multiprocess mode. This prevents inadvertent interactions with tests expecting a single threaded environment.  Displaying load is really only helpful for buildbots running in multiprocess mode anyway..
(cherry picked from commit 19058b9)

Co-authored-by: Jeremy Kloth <jeremy.kloth@gmail.com>
zooba pushed a commit that referenced this pull request Mar 22, 2022
…H-26578)

Replace the child process `typeperf.exe` with a daemon thread that reads the performance counters directly.  This prevents the issues that arise from inherited handles in grandchild processes (see issue37531 for discussion).

We only use the load tracker when running tests in multiprocess mode. This prevents inadvertent interactions with tests expecting a single threaded environment.  Displaying load is really only helpful for buildbots running in multiprocess mode anyway.

Co-authored-by: Jeremy Kloth <jeremy.kloth@gmail.com>
zooba pushed a commit that referenced this pull request Mar 22, 2022
…H-26578)

Replace the child process `typeperf.exe` with a daemon thread that reads the performance counters directly.  This prevents the issues that arise from inherited handles in grandchild processes (see issue37531 for discussion).

We only use the load tracker when running tests in multiprocess mode. This prevents inadvertent interactions with tests expecting a single threaded environment.  Displaying load is really only helpful for buildbots running in multiprocess mode anyway..

Co-authored-by: Jeremy Kloth <jeremy.kloth@gmail.com>
@jkloth jkloth deleted the issue44336 branch April 5, 2022 22:28
hello-adam pushed a commit to hello-adam/cpython that referenced this pull request Jun 2, 2022
…ythonGH-26578)

Replace the child process `typeperf.exe` with a daemon thread that reads the performance counters directly.  This prevents the issues that arise from inherited handles in grandchild processes (see issue37531 for discussion).

We only use the load tracker when running tests in multiprocess mode. This prevents inadvertent interactions with tests expecting a single threaded environment.  Displaying load is really only helpful for buildbots running in multiprocess mode anyway..

Co-authored-by: Jeremy Kloth <jeremy.kloth@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip news tests Tests in the Lib/test dir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants
0