-
-
Notifications
You must be signed in to change notification settings - Fork 32k
Null characters in strings cause a C SystemError #97556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This issue seems like a duplicate of the issue #96670. |
Ah, you're totally right. I found that one earlier, but misunderstood the core bug and re-discovered it. Thanks! |
Actually, I just re-read that issue, and I was a bit over-eager with closing this. I feel like this is a very specific fix which doesn't actually solve their use case: This particular issue only shows up in 3.10, and causes an internal parse error. That linked issue is present in every 3.x version I've tested, including after this fix. I believe a patch elsewhere in the parser is needed for that. |
Can you submit a fix? |
Submitted a fix in this pull request |
I cannot repro OP's For Python 3.8.16, 3.9.16, and 3.10.0a1 (WSL Debian local builds),
For Python 3.10.0b1 (WSL Debian local), 3.10.0 (WSL Debian local), 3.10.11 (WSL Debian local and Win11), 3.11.3 (WSL Debian local and Win11), and (importantly) 3.12.0a7 (WSL Debian local), parsing fails with Interestingly, the attributed line number of the exception location changes in 3.12.0a7. It's reported as line 3 in 3.12.0a7, and as line 2 for all other versions tested. Regardless, the triple-quoted string case is not yet fixed in the fashion indicated by #96670 (comment), where strings containing null characters are explicitly rejected with an exception and message like |
Attn @pablogsal as author of #97594. |
@lysnikolaou ^^ |
I'll have a look at this some time either today or tomorrow during the sprints. |
Script to reproduce the issues: with open("script.py", "w") as fp:
print("# -*- coding: latin-1 -*-", file=fp)
print('"""', file=fp)
print('\0', file=fp)
print('"""', file=fp) Current behavior:
Oh, it seems like the main branch was just changed (40 min ago) by PR #104136 (just merged). @lysnikolaou: Do you consider backporting your change to Python 3.11? |
With the merge of #97594 and #104136, I believe what OP has raised is now resolved in Thus, I think this issue can be closed. |
It's not as if Python 3.11 accepts NUL characters. The question is more about providing a better error message if it finds a NUL character. |
It was @gvanrossum's call not to backport the original fix, and I wouldn't presume to speak authoritatively for him. But, I assume the motivation is the xkcd 1172 aspect of changing the error message string. |
I think we should backport this to 3.11 |
Okay let’s backport. |
Should #97594 be backported also, then? |
Backporting both makes the most sense to me, since just improving the error message would require much more work with the tokenizer buffer working the way it is. If no one has any serious concerns over this, I can work on backporting the necessary PRs. |
Let’s ask @pablogsal about that one. |
Yeah, I prefer to backport the restriction to keep the changes to a minimum in 3.11 as making a bugfix for the error message can be much more complex 👍 |
Raising a |
I confirm that 3.11 and main branches now display the same output for
|
Uh oh!
There was an error while loading. Please reload this page.
Crash report
Putting a null byte into a Python string causes a SystemError in Python 3.10, due to a call to strlen in the string parsing library. In Python 3.9, the following example runs without errors:
In Python 3.10, it raises
SystemError: ../Parser/string_parser.c:219: bad argument to internal function
.Internally, the new string_parser library introduced in v3.10.0a1 uses a call to strlen to determine the string size, which is getting thrown off by the null byte. This call is actually unnecessary, as the length has already been calculated by the calling parser and can be retrieved with
PyBytes_AsStringAndSize
.Error messages
For single line strings, the error is
SystemError: Negative size passed to PyUnicode_New
For multiline strings, the error is
SystemError: ../Parser/string_parser.c:219: bad argument to internal function
Linked PRs
The text was updated successfully, but these errors were encountered: