8000 io.TextIOWrapper ignores silently partial write if buffer is unbuffered · Issue #85393 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

io.TextIOWrapper ignores silently partial write if buffer is unbuffered #85393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
manueljacob mannequin opened this issue Jul 6, 2020 · 10 comments
Open

io.TextIOWrapper ignores silently partial write if buffer is unbuffered #85393

manueljacob mannequin opened this issue Jul 6, 2020 · 10 comments
Labels
3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes extension-modules C modules in the Modules dir topic-IO type-bug An unexpected behavior, bug, or error

Comments

@manueljacob
Copy link
Mannequin
manueljacob mannequin commented Jul 6, 2020
BPO 41221
Nosy @pitrou, @vstinner, @methane, @serhiy-storchaka, @eryksun, @manueljacob

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2020-07-06.17:33:53.629>
labels = ['type-bug', '3.8', 'expert-IO']
title = 'io.TextIOWrapper ignores silently partial write if buffer is unbuffered'
updated_at = <Date 2020-07-08.10:18:50.897>
user = 'https://github.com/manueljacob'

bugs.python.org fields:

activity = <Date 2020-07-08.10:18:50.897>
actor = 'eryksun'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['IO']
creation = <Date 2020-07-06.17:33:53.629>
creator = 'mjacob'
dependencies = []
files = []
hgrepos = []
issue_num = 41221
keywords = []
message_count = 8.0
messages = ['373151', '373167', '373203', '373207', '373216', '373287', '373289', '373300']
nosy_count = 6.0
nosy_names = ['pitrou', 'vstinner', 'methane', 'serhiy.storchaka', 'eryksun', 'mjacob']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue41221'
versions = ['Python 3.8']

Linked PRs

@manueljacob
Copy link
Mannequin Author
manueljacob mannequin commented Jul 6, 2020

Without unbuffered mode, it works as expected:

% python -c "import sys; sys.stdout.write('x'*4294967296)" | wc -c
4294967296

% python -c "import sys; print('x'*4294967296)" | wc -c
4294967297

With unbuffered mode, writes get truncated to 2147479552 bytes on my Linux machine:

% python -u -c "import sys; sys.stdout.write('x'*4294967296)" | wc -c
2147479552

% python -u -c "import sys; print('x'*4294967296)" | wc -c
2147479553

I didn’t try, but it’s probably an even bigger problem on Windows, where writes might be limited to 32767 bytes: https://github.com/python/cpython/blob/v3.9.0b4/Python/fileutils.c#L1585

Without unbuffered mode, sys.stdout.buffer is a io.BufferedWriter object.

% python -c 'import sys; print(sys.stdout.buffer)'
<_io.BufferedWriter name='<stdout>'>

With unbuffered mode, sys.stdout.buffer is a io.FileIO object.

% python -u -c 'import sys; print(sys.stdout.buffer)'
<_io.FileIO name='<stdout>' mode='wb' closefd=False>

io.BufferedWriter implements the io.BufferedIOBase interface. io.BufferedIOBase.write() is documented to write all passed bytes. io.FileIO implements the io.RawIOBase interface. io.RawIOBase.write() is documented to be able to write less bytes than passed.

io.TextIOWrapper.write() is not documented to write all characters it has been passed, but e.g. print() relies on that.

To fix the problem, it has to be ensured that either

  • sys.stdout.buffer is an object that guarantees that all bytes passed to its write() method are written (e.g. deriving from io.BufferedIOBase), or
  • io.TextIOWrapper calls the write() method of its underlying binary stream until all bytes have been written, or
  • users of io.TextIOWrapper call write() until all characters have been written.

In the first two possibilities it probably makes sense to tighten the contract of io.TextIOBase.write to guarantee that all passed characters are written.

@manueljacob manueljacob mannequin added 3.8 (EOL) end of life topic-IO type-bug An unexpected behavior, bug, or error labels Jul 6, 2020
@manueljacob
Copy link
Mannequin Author
manueljacob mannequin commented Jul 6, 2020

2147479552 is the 0x7ffff000 bytes limit documented for write() on Linux (source: https://man7.org/linux/man-pages/man2/write.2.html). The limit could be even smaller in other circumstances or other systems.

I’m adding Victor Stinner to the nosy list, as he added the code limiting writes to the console on Windows in e0daff1, and he might have an opinion on the topic.

@manueljacob
Copy link
Mannequin Author
manueljacob mannequin commented Jul 7, 2020

io.TextIOWrapper.write() returns the length of the passed string instead of the actually written number of characters.

% python -u -c "import sys; print(sys.stdout.write('x'*4294967296), file=sys.stderr)" | wc -c
4294967296
2147479552

So the possibility "users of io.TextIOWrapper call write() until all characters have been written" would not be sufficient.

@manueljacob
Copy link
Mannequin Author
manueljacob mannequin commented Jul 7, 2020

It’s possible to trigger the problem on Unix with much smaller sizes, e.g. by interrupting the write() with a signal handler (even if the signal handler doesn’t do anything). The following script starts a subprocess doing a 16MiB write and sends a signal, which is handled but is a no-op, after reading a bit from the pipe:

import signal
import subprocess
import sys

CHILD_PROCESS = '''
import signal, sys
signal.signal(signal.SIGINT, lambda *x: None)
written = sys.stdout.write('x' * 16777216)
print('written:', written, file=sys.stderr, flush=True)
'''

proc = subprocess.Popen(
    [sys.executable, '-u', '-c', CHILD_PROCESS],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
)
read = len(proc.stdout.read(1))
proc.send_signal(signal.SIGINT)
read += len(proc.stdout.read())
stdout, stderr = proc.communicate()
assert stdout == b''
print('stderr:', stderr)
assert read == 16777216, "read: {}".format(read)
% python3 test_interrupted_write.py
stderr: b'written: 16777216\n'
Traceback (most recent call last):
  File "test_interrupted_write.py", line 24, in <module>
    assert read == 16777216, "read: {}".format(read)
AssertionError: read: 69632

If I remove the '-u' that gets passed to the subprocess:

% python3 test_interrupted_write.py
stderr: b'written: 16777216\n'

@vstinner
Copy link
Member
vstinner commented Jul 7, 2020

cc Antoine Pitrou who was involved in io module design.

Currently, the io.TextIOWrapper implementation doesn't handle partial write: it doesn't fully support an unbuffered 'buffer' object.

It should either handle partial write internally, or it should inject a buffered writer between itself (TextIOWrapper) and the unbuffered buffer so handling partial writes who be handled by the buffered writer.

The socket.socket class has a sendall() method which helps to handle such problem. In the io module, sometimes write() can do a partial write (unbuffered writer like FileIO), sometimes it doesn't (buffered writer like BufferedWriter).

== C implementation ==

Modules/_io/text.c. The _io_TextIOWrapper_write_impl() function puts the encoded string into an internal pending_bytes list. If needed, it calls flush(): _textiowrapper_writeflush().

The pseudo-code of _textiowrapper_writeflush() is to call "self.buffer.write(b)" where b is made of all "pending bytes". write() result is ignored: partial write is silently ignored.

== Python implementation ==

_pyio.TextIOWrapper.write() simply calls: "self.buffer.write(b)". It ignores partial write silently.

@vstinner vstinner changed the title Output of print() might get truncated in unbuffered mode io.TextIOWrapper ignores silently partial write if buffer is unbuffered Jul 7, 2020
@vstinner vstinner changed the title Output of print() might get truncated in unbuffered mode io.TextIOWrapper ignores silently partial write if buffer is unbuffered Jul 7, 2020
@serhiy-storchaka
Copy link
Member

Oh, this is a serious problem.

AFAIK TextIOWrapper initially supported only buffered writers, but the support of non-buffered writers was added later. We can make TextIOWrapper calling write() of underlying binary stream repeatedly, but it will break the code which uses TextIOWrapper with other file-like objects whose write() method does not return the number of written bytes. For example:

    buffer = []
    class Writer: write = buffer.append
    t = TextIOWrapper(Writer())

Even if we fix writing sys.stdout and sys.stderr there will be a problem with programs which write directly to sys.stdout.buffer or use open(buffering=0).

This is a complex issue and it needs a complex solution.

@vstinner
Copy link
Member
vstinner commented Jul 8, 2020

but it will break the code which uses TextIOWrapper with other file-like objects whose write() method does not return the number of written bytes.

We can detect if write() doesn't return an integer and don't attempt to call write() in a loop (until all bytes are written) in this case.

8000

@eryksun
Copy link
Contributor
eryksun commented Jul 8, 2020

it’s probably an even bigger problem on Windows, where writes might be
limited to 32767 bytes:

This check and workaround in _Py_write_impl doesn't affect disk files and pipes. It only affects character devices for which isatty is true, and really it's only concerned with console screen-buffer files. The workaround is not required in Windows 8.1+, so it can be removed in Python 3.9+.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@siemer
Copy link
siemer commented Oct 27, 2024

Any updates on this serious issue? A lot of python code out there will fall for this.

The UnbufferedWhatevers’s user’s expectation of this layer is to either finish the job or throw an exception. The io module should do exactly that (and the documentation should be corrected to reflect that, of course).

[The user might also have the option (or an alternative object) to switch from finish-or-throw to return-the-problem-to-the-caller (e.g. in the usual form of bytes read/written).—That mode would guarantee that the underlying op is only called once.]

@picnixz picnixz added extension-modules C modules in the Modules dir 3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes and removed 3.8 (EOL) end of life labels Dec 3, 2024
@cmaloney
Copy link
Contributor
8000

Taking a look at this during PyCon 2025 development sprints. Planning to try and handle both cases where things block as well as .write() calls which are partial (respectively BlockingIOError with written count and .write return which doesn't match buffer length). My hope is to emulate the BufferedIO tests around those cases with Text I/O.

I'm not planning to do a full / comprehensive audit of "all blocking I/O" should work under Text I/O, but the "write" path should be fairly comprehensive (it all turns into one function _textiowrapper_writeflush.

cmaloney added a commit to cmaloney/cpython that referenced this issue May 21, 2025
Also improves non-blocking support (pythongh-57531) and unbuffered support (pythongh-61606)
cmaloney added a commit to cmaloney/cpython that referenced this issue May 21, 2025
Also improves non-blocking support (pythongh-57531) and unbuffered support (pythongh-61606)
cmaloney added a commit to cmaloney/cpython that referenced this issue May 21, 2025
Also improves non-blocking support (pythongh-57531) and unbuffered support (pythongh-61606)
cmaloney added a commit to cmaloney/cpython that referenced this issue May 21, 2025
Also improves non-blocking support (pythongh-57531) and unbuffered support (pythongh-61606)
cmaloney added a commit to cmaloney/cpython that referenced this issue May 21, 2025
Also improves non-blocking support (pythongh-57531) and unbuffered support (pythongh-61606)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes extension-modules C modules in the Modules dir topic-IO type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

6 participants
0