8000 Null characters in strings cause a C SystemError · Issue #97556 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

Null characters in strings cause a C SystemError #97556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
apccurtiss opened this issue Sep 26, 2022 · 21 comments
Closed

Null characters in strings cause a C SystemError #97556

apccurtiss opened this issue Sep 26, 2022 · 21 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes 3.12 only security fixes type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@apccurtiss
Copy link
apccurtiss commented Sep 26, 2022

Crash report

Putting a null byte into a Python string causes a SystemError in Python 3.10, due to a call to strlen in the string parsing library. In Python 3.9, the following example runs without errors:

# -*- coding: latin-1 -*-
"""
<NULL>
"""

In Python 3.10, it raises SystemError: ../Parser/string_parser.c:219: bad argument to internal function.

Internally, the new string_parser library introduced in v3.10.0a1 uses a call to strlen to determine the string size, which is getting thrown off by the null byte. This call is actually unnecessary, as the length has already been calculated by the calling parser and can be retrieved with PyBytes_AsStringAndSize.

Error messages

For single line strings, the error is SystemError: Negative size passed to PyUnicode_New

For multiline strings, the error is SystemError: ../Parser/string_parser.c:219: bad argument to internal function

Linked PRs

@apccurtiss apccurtiss added the type-crash A hard crash of the interpreter, possibly with a core dump label Sep 26, 2022
@vstinner
Copy link
Member

This issue seems like a duplicate of the issue #96670.

@apccurtiss
Copy link
Author

Ah, you're totally right. I found that one earlier, but misunderstood the core bug and re-discovered it. Thanks!

@apccurtiss
Copy link
Author

Actually, I just re-read that issue, and I was a bit over-eager with closing this. I feel like this is a very specific fix which doesn't actually solve their use case: This particular issue only shows up in 3.10, and causes an internal parse error. That linked issue is present in every 3.x version I've tested, including after this fix. I believe a patch elsewhere in the parser is needed for that.

@apccurtiss apccurtiss reopened this Sep 26, 2022
@gvanrossum
Copy link
Member

Can you submit a fix?

@gvanrossum gvanrossum added 3.11 only security fixes 3.10 only security fixes 3.9 only security fixes 3.12 only security fixes labels Sep 26, 2022
@apccurtiss
Copy link
Author

Submitted a fix in this pull request

@bskinn
Copy link
Contributor
bskinn commented Apr 24, 2023

I cannot repro OP's SystemError on this. I do observe Python 3.12's fix for null characters in single-quoted strings (SyntaxError: source code cannot contain null bytes per #97594, when run with 3.12.0a7), so going forward I'm focusing on triple-quoted strings.

For Python 3.8.16, 3.9.16, and 3.10.0a1 (WSL Debian local builds), crasher.py with the following contents parses successfully (without error, I did not check correctness) when executed from the commandline ($ python3.<#> crasher.py):

<BLANKLINE>
'''
<NUL>
'''

For Python 3.10.0b1 (WSL Debian local), 3.10.0 (WSL Debian local), 3.10.11 (WSL Debian local and Win11), 3.11.3 (WSL Debian local and Win11), and (importantly) 3.12.0a7 (WSL Debian local), parsing fails with SyntaxError: unterminated triple-quoted string literal.

Interestingly, the attributed line number of the exception location changes in 3.12.0a7. It's reported as line 3 in 3.12.0a7, and as line 2 for all other versions tested.

Regardless, the triple-quoted string case is not yet fixed in the fashion indicated by #96670 (comment), where strings containing null characters are explicitly rejected with an exception and message like SyntaxError: source code cannot contain null bytes.

@bskinn
Copy link
Contributor
bskinn commented Apr 25, 2023

Attn @pablogsal as author of #97594.

@gvanrossum
Copy link
Member

@lysnikolaou ^^

@lysnikolaou
Copy link
Member

I'll have a look at this some time either today or tomorrow during the sprints.

@vstinner
Copy link
Member
vstinner commented May 4, 2023

Script to reproduce the issues:

with open("script.py", "w") as fp:
    print("# -*- coding: latin-1 -*-", file=fp)
    print('"""', file=fp)
    print('\0', file=fp)
    print('"""', file=fp)

Current behavior:

  • main branch (3.12): SyntaxError: unterminated triple-quoted string literal (detected at line 3)
  • 3.11: SystemError: Parser/string_parser.c:230: bad argument to internal function

Oh, it seems like the main branch was just changed (40 min ago) by PR #104136 (just merged).

@lysnikolaou: Do you consider backporting your change to Python 3.11?

@bskinn
Copy link
Contributor
bskinn commented May 4, 2023

With the merge of #97594 and #104136, I believe what OP has raised is now resolved in main/3.12, and per #96670 (comment) the changes are not to be backported.

Thus, I think this issue can be closed.

@vstinner
Copy link
Member
vstinner commented May 4, 2023

SystemError: Parser/string_parser.c:230: bad argument to internal function sounds like a bug in Python.

It's not as if Python 3.11 accepts NUL characters. The question is more about providing a better error message if it finds a NUL character.

@bskinn
Copy link
Contributor
bskinn commented May 4, 2023

It was @gvanrossum's call not to backport the original fix, and I wouldn't presume to speak authoritatively for him. But, I assume the motivation is the xkcd 1172 aspect of changing the error message string.

@pablogsal
Copy link
Member

I think we should backport this to 3.11

@gvanrossum
Copy link
Member

Okay let’s backport.

@bskinn
Copy link
Contributor
bskinn commented May 4, 2023

Should #97594 be backported also, then?

@lysnikolaou
Copy link
Member

Backporting both makes the most sense to me, since just improving the error message would require much more work with the tokenizer buffer working the way it is. If no one has any serious concerns over this, I can work on backporting the necessary PRs.

@gvanrossum
Copy link
Member

Let’s ask @pablogsal about that one.

@pablogsal
Copy link
Member

Backporting both makes the most sense to me, since just improving the error message would require much more work with the tokenizer buffer working the way it is. If no one has any serious concerns over this, I can work on backporting the necessary PRs.

Yeah, I prefer to backport the restriction to keep the changes to a minimum in 3.11 as making a bugfix for the error message can be much more complex 👍

@lysnikolaou
Copy link
Member

Raising a SyntaxError when parsing NULL bytes has been backported to 3.11. I'm closing this.

@vstinner
Copy link
Member
vstinner commented May 9, 2023

I confirm that 3.11 and main branches now display the same output for ./python script.py (see #97556 (comment) to generate script.py).

  File "/home/vstinner/python/main/../3.11/script.py", line 3
    
SyntaxError: source code cannot contain null bytes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes 3.12 only security fixes type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

6 participants
0