8000 bytes.decode doesn't honor error mode · Issue #5542 · RustPython/RustPython · GitHub
[go: up one dir, main page]

Skip to content

bytes.decode doesn't honor error mode #5542

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
arihant2math opened this issue Feb 22, 2025 · 2 comments · Fixed by #5546
Closed

bytes.decode doesn't honor error mode #5542

arihant2math opened this issue Feb 22, 2025 · 2 comments · Fixed by #5546
Labels
C-compat A discrepancy between RustPython and CPython

Comments

@arihant2math
Copy link
Collaborator

On Windows:

❯ python
Python 3.13.1 (tags/v3.13.1:0671451, Dec  3 2024, 19:06:28) [MSC v.1942 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> b'test_6868_tmp\xed\xb4\x80'.decode("utf-8", "surrogatepass")
'test_6868_tmp\udd00'
>>> exit()
❯ ./target/release/rustpython
Welcome to the magnificent Rust Python 0.4.0 interpreter 😱 🖖
RustPython 3.13.0
Type "help", "copyright", "credits" or "license" for more information.
>>>>> b'test_6868_tmp\xed\xb4\x80'.decode("utf-8", "surrogatepass")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "encodings_utf_8", line 16, in decode
UnicodeDecodeError: ('utf-8', b'test_6868_tmp\xed\xb4\x80', 13, 14, 'invalid continuation byte')
>>>>> exit()

The rustpython demo on the web also fails (although that is really out of date).

@arihant2math arihant2math added the C-compat A discrepancy between RustPython and CPython label Feb 22, 2025
@arihant2math
Copy link
Collaborator Author

Seems like an inconsistency between how python and rust unicode implementations handle the bytes.

@arihant2math
Copy link
Collaborator Author

Ah it seems like rustpython does not honor decode modes:

❯ python                                                                                                                    
Python 3.13.1 (tags/v3.13.1:0671451, Dec  3 2024, 19:06:28) [MSC v.1942 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> b'test_6868_tmp\xed\xb4\x80'.decode("utf-8", "strict")
Traceback (most recent call last):
  File "<python-input-0>", line 1, in <module>
    b'test_6868_tmp\xed\xb4\x80'.decode("utf-8", "strict")
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 13: invalid continuation byte
>>> b'test_6868_tmp\xed\xb4\x80'.decode("utf-8", "surrogatepass")
'test_6868_tmp\udd00'
>>> 

While rustpython complains about a invalid continuation byte no matter what.

@arihant2math arihant2math changed the title str.decode is inconsistent bytes.decode doesn't honor error mode Feb 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-compat A discrepancy between RustPython and CPython
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant
0