-
-
Notifications
You must be signed in to change notification settings - Fork 32k
io.TextIOWrapper ignores silently partial write if buffer is unbuffered #85393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Without unbuffered mode, it works as expected: % python -c "import sys; sys.stdout.write('x'*4294967296)" | wc -c % python -c "import sys; print('x'*4294967296)" | wc -c With unbuffered mode, writes get truncated to 2147479552 bytes on my Linux machine: % python -u -c "import sys; sys.stdout.write('x'*4294967296)" | wc -c % python -u -c "import sys; print('x'*4294967296)" | wc -c I didn’t try, but it’s probably an even bigger problem on Windows, where writes might be limited to 32767 bytes: https://github.com/python/cpython/blob/v3.9.0b4/Python/fileutils.c#L1585 Without unbuffered mode, % python -c 'import sys; print(sys.stdout.buffer)' With unbuffered mode, % python -u -c 'import sys; print(sys.stdout.buffer)'
To fix the problem, it has to be ensured that either
In the first two possibilities it probably makes sense to tighten the contract of |
2147479552 is the 0x7ffff000 bytes limit documented for write() on Linux (source: https://man7.org/linux/man-pages/man2/write.2.html). The limit could be even smaller in other circumstances or other systems. I’m adding Victor Stinner to the nosy list, as he added the code limiting writes to the console on Windows in e0daff1, and he might have an opinion on the topic. |
% python -u -c "import sys; print(sys.stdout.write('x'*4294967296), file=sys.stderr)" | wc -c So the possibility "users of |
It’s possible to trigger the problem on Unix with much smaller sizes, e.g. by interrupting the write() with a signal handler (even if the signal handler doesn’t do anything). The following script starts a subprocess doing a 16MiB write and sends a signal, which is handled but is a no-op, after reading a bit from the pipe: import signal
import subprocess
import sys
CHILD_PROCESS = '''
import signal, sys
signal.signal(signal.SIGINT, lambda *x: None)
written = sys.stdout.write('x' * 16777216)
print('written:', written, file=sys.stderr, flush=True)
'''
proc = subprocess.Popen(
[sys.executable, '-u', '-c', CHILD_PROCESS],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
read = len(proc.stdout.read(1))
proc.send_signal(signal.SIGINT)
read += len(proc.stdout.read())
stdout, stderr = proc.communicate()
assert stdout == b''
print('stderr:', stderr)
assert read == 16777216, "read: {}".format(read) % python3 test_interrupted_write.py
stderr: b'written: 16777216\n'
Traceback (most recent call last):
File "test_interrupted_write.py", line 24, in <module>
assert read == 16777216, "read: {}".format(read)
AssertionError: read: 69632 If I remove the '-u' that gets passed to the subprocess: % python3 test_interrupted_write.py |
cc Antoine Pitrou who was involved in io module design. Currently, the io.TextIOWrapper implementation doesn't handle partial write: it doesn't fully support an unbuffered 'buffer' object. It should either handle partial write internally, or it should inject a buffered writer between itself (TextIOWrapper) and the unbuffered buffer so handling partial writes who be handled by the buffered writer. The socket.socket class has a sendall() method which helps to handle such problem. In the io module, sometimes write() can do a partial write (unbuffered writer like FileIO), sometimes it doesn't (buffered writer like BufferedWriter). == C implementation == Modules/_io/text.c. The _io_TextIOWrapper_write_impl() function puts the encoded string into an internal pending_bytes list. If needed, it calls flush(): _textiowrapper_writeflush(). The pseudo-code of _textiowrapper_writeflush() is to call "self.buffer.write(b)" where b is made of all "pending bytes". write() result is ignored: partial write is silently ignored. == Python implementation == _pyio.TextIOWrapper.write() simply calls: "self.buffer.write(b)". It ignores partial write silently. |
Oh, this is a serious problem. AFAIK TextIOWrapper initially supported only buffered writers, but the support of non-buffered writers was added later. We can make TextIOWrapper calling write() of underlying binary stream repeatedly, but it will break the code which uses TextIOWrapper with other file-like objects whose write() method does not return the number of written bytes. For example: buffer = []
class Writer: write = buffer.append
t = TextIOWrapper(Writer()) Even if we fix writing sys.stdout and sys.stderr there will be a problem with programs which write directly to sys.stdout.buffer or use open(buffering=0). This is a complex issue and it needs a complex solution. |
We can detect if write() doesn't return an integer and don't attempt to call write() in a loop (until all bytes are written) in this case. 8000 |
This check and workaround in _Py_write_impl doesn't affect disk files and pipes. It only affects character devices for which isatty is true, and really it's only concerned with console screen-buffer files. The workaround is not required in Windows 8.1+, so it can be removed in Python 3.9+. |
Any updates on this serious issue? A lot of python code out there will fall for this. The UnbufferedWhatevers’s user’s expectation of this layer is to either finish the job or throw an exception. The io module should do exactly that (and the documentation should be corrected to reflect that, of course). [The user might also have the option (or an alternative object) to switch from finish-or-throw to return-the-problem-to-the-caller (e.g. in the usual form of bytes read/written).—That mode would guarantee that the underlying op is only called once.] |
Taking a look at this during PyCon 2025 development sprints. Planning to try and handle both cases where things block as well as I'm not planning to do a full / comprehensive audit of "all blocking I/O" should work under Text I/O, but the "write" path should be fairly comprehensive (it all turns into one function |
Also improves non-blocking support (pythongh-57531) and unbuffered support (pythongh-61606)
Also improves non-blocking support (pythongh-57531) and unbuffered support (pythongh-61606)
Also improves non-blocking support (pythongh-57531) and unbuffered support (pythongh-61606)
Also improves non-blocking support (pythongh-57531) and unbuffered support (pythongh-61606)
Also improves non-blocking support (pythongh-57531) and unbuffered support (pythongh-61606)
Uh oh!
There was an error while loading. Please reload this page.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: