-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
gh-129205: Use os.readinto() in subprocess errpipe_read #129498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1 change: 1 addition & 0 deletions
1
Misc/NEWS.d/next/Library/2025-02-02-20-55-18.gh-issue-129205.hb1iMy.rst
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
:mod:`subprocess` forkserver now uses a fixed 50,000 byte buffer to read startup errors from the child processes it runs. Previously there could be up to 100,000 bytes read with multiple allocations and copies. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was nothing special about the 50_000 in the existing code. I put it in there in order to be fail safe and have some limit on this pipe in case something weird happened as unbounded reads are unwise. In reality the data coming through this error pipe will always be tiny in normally operating processes on normally operating machines. This code is not trying to prevent allocations or limit allocation to a specific size. It's just a failsafe.
This PR would cause it to always allocate and zero fill excessive memory during normal operation.
The norm is that the exception info sent through the pipe, if any, will be smaller than PIPEBUF (512) and show up in its entirety in the first read cal. So this loop isn't expected to do much, the second
+=
if there was an error to report will be a+= b''
no-op before it breaks out of the loop.If the existing bounded os.read calls happen to pre-allocate 50k each, they at least should not be zero filling those. we could change the number to 3500 and be fine if that allocation size actually matters to anyone. I don't think it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically: I think readinto is a regression in this use case. It is normally 0 data read, and always tiny data when there is something to read. The limit is merely a fail-safe. If you have identified performance issues due to the current values, propose lowering them. But lets not complicate things with a pre-allocated zero-filled bytearray and readinto. the normal pattern of system calls will be either:
or with a very rare error to be reported:
The loop is a fail-safe because that's how we're supposed to use read APIs on pipes rather than relying on the OS mapping a single small <PIPEBUF write to a single small <PIPEBUF read on the other end, even though in practice that is what'll happen.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither
bytearray
noros.read
forces a zero fill currently, they just doPyMem_Malloc
.bytearray
does set the byte past the end to zero / guarantee the byte array is null terminated, but it doesn't zero-set the whole array.bytearray
construction: https://github.com/python/cpython/blob/main/Objects/bytearrayobject.c#L136-L152os.read
:os_read_impl: https://github.com/python/cpython/blob/main/Modules/posixmodule.c#L11436-L11463
PyBytes_FromStringandSize: https://github.com/python/cpython/blob/main/Objects/bytesobject.c#L136-L161
_PyBytes_FromSize: https://github.com/python/cpython/blob/main/Objects/bytesobject.c#L119-L133
_PyObject_InitVar: https://github.com/python/cpython/blob/main/Include/internal/pycore_object.h#L430-L447
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can definitely lower the initial allocation from 50_000, the
os.read
currently always allocates that much as well (then downsizes to the actual read size). I'll have a look at making and testing a better errpipe upper bound size.