-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-31592: Fix an assertion failure in Python/ast.c in case of a bad unicodedata.normalize() #3767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-31592: Fix an assertion failure in Python/ast.c in case of a bad unicodedata.normalize() #3767
Conversation
Added another change to the patch, to fix the bug that Serhiy mentioned in https://bugs.python.org/issue31592#msg303043. |
Python/ast.c
Outdated
id2 = PyObject_Call(c->c_normalize, c->c_normalize_args, NULL); | ||
/* Use _PyObject_FastCall() this way to conceal c->c_normalize_args | ||
from the user. */ | ||
8000 | id2 = _PyObject_FastCall(c->c_normalize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use c->c_normalize_args
. Use just a 2-element C array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right.
Would you mind to mention why this is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is with reusing the same tuple for passing arguments. If the fake normalize()
save the reference to the tuple, it will see that an immutable tuple is mutated. It should be implemented in C, I can't reproduce the problem with Python code.
Just allocate a 2-element array on the stack, fill it with arguments, and pass it to the function. This will significantly simplify the code too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just revert you last change, I'll create a separate PR. These bugs are related to the same function, but can be fixed separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that we must conceal the tuple c->c_normalize_args
from the user.
Doesn't my patch conceal it? I passed only the tuple items array to _PyObject_FastCall()
, so even it doesn't have access to the tuple.
for example, _PyObject_FastCall()
might eventually cause calling function_code_fastcall()
, which would copy the args C array into f_localsplus
, or it might call _PyStack_AsTuple()
, which would copy the args C array into a new tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is my fix bad because it uses the internal structure of tuple
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, your patch conceal it. But using c->c_normalize_args
as a buffer is suboptimal. You don't need a heap allocation, and the code would be simpler if use a stack variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, thanks for the explanation :)
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase And if you don't make the requested changes, you will be poked with soft cushions! |
I didn't expect the Spanish Inquisition! |
Nobody expects the Spanish Inquisition! @serhiy-storchaka: please review the changes made to this pull request. |
Python/ast.c
Outdated
id2 = _PyObject_FastCall(c->c_normalize, | ||
((PyTupleObject *)c->c_normalize_args)->ob_item, | ||
2); | ||
PyObject *form = PyUnicode_FromString("NFKC"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use _Py_IDENTIFIER()
.
@@ -1,2 +1,2 @@ | |||
Fix an assertion failure in case of a bad `unicodedata.normalize()`. Patch | |||
by Oren Milman. | |||
Fixed an assertion failure in Python parser in case of a bad `unicodedata.normalize()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the 'Python parser' part :)
BTW, in https://devguide.python.org/committing/#what-s-new-and-news-entries, the example uses 'Fix ...' (as opposed to 'Fixed ...').
Which phrasing should be used? Or is it unimportant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer 'Fixed ...'. But my English is really bad. I just use old patterns.
About good and bad phrasing read this: https://mail.python.org/pipermail/python-dev/2011-May/111303.html.
Thanks @orenmn for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.6. |
GH-3836 is a backport of this pull request to the 3.6 branch. |
… a bad unicodedata.normalize(). (pythonGH-3767) (cherry picked from commit 7dc46d8)
ast.c
: add a check whetherunicodedata.normalize()
returned a string.test_ast.py
: add tests to verify that the assertion failure is no more.https://bugs.python.org/issue31592