8000 [3.9] bpo-45494: Fix parser crash when reporting errors involving invalid continuation characters (GH-28993) by ambv · Pull Request #29071 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

[3.9] bpo-45494: Fix parser crash when reporting errors involving invalid continuation characters (GH-28993) #29071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 20, 2021

Conversation

ambv
Copy link
Contributor
@ambv ambv commented Oct 19, 2021

There are two errors that this commit fixes:

  • The parser was not correctly computing the offset and the string
    source for E_LINECONT errors due to the incorrect usage of strtok().
  • The parser was not correctly unwinding the call stack when a tokenizer
    exception happened in rules involving optionals ('?', [...]) as we
    always make them return valid results by using the comma operator. We
    need to check first if we don't have an error before continuing..
    (cherry picked from commit a106343)

Co-authored-by: Pablo Galindo Salgado Pablogsal@gmail.com

https://bugs.python.org/issue45494

…alid continuation characters (pythonGH-28993)

There are two errors that this commit fixes:

* The parser was not correctly computing the offset and the string
  source for E_LINECONT errors due to the incorrect usage of strtok().
* The parser was not correctly unwinding the call stack when a tokenizer
  exception happened in rules involving optionals ('?', [...]) as we
  always make them return valid results by using the comma operator. We
  need to check first if we don't have an error before continuing..
(cherry picked from commit a106343)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
@ambv
Copy link
Contributor Author
ambv commented Oct 19, 2021

On Python 3.9, the actual position reported is (3, 22) instead of the expected (2, 2). That's both before and after this PR. The difference is that the PR solves an assertion failure (see my comment on the issue) so it's worth landing it anyway. I'll be changing the test here to point at (3, 22) then.

def test_error_offset_continuation_characters(self):
check = self.check
check('"\\\n"(1 for c in I,\\\n\\', 2, 2)
check('"\\\n"(1 for c in I,\\\n\\', 3, 22)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunately a regression:

>>> try:
...   compile('"\\\n"(1 for c in I,\\\n\\', "", "exec")
... except SyntaxError as e:
...   f = e
...
>>> f.lineno
2
>>> f.offset
3

Copy link
Member
@pablogsal pablogsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to figure out why this points to some random place :(

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

And if you don't make the requested changes, you will be put in the comfy chair!

@pablogsal
Copy link
Member

That's both before and after this PR.
Maybe I am missing something but I am not getting that:

Python 3.9.7 (default, Oct 10 2021, 15:13:22)
[GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> try:
...   compile('"\\\n"(1 for c in I,\\\n\\', "", "exec")
... except SyntaxError as e:
...   f = e
...
>>> f
SyntaxError('Generator expression must be parenthesized', ('', 2, 3, '"\\\n"(1 for c in I,\\\n\\\n'))
>>> f.offset
3
>>> f.lineno
2

Copy link
Member
@pablogsal pablogsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand the old error is wrong:

Python 3.9.7 (default, Oct 10 2021, 15:13:22)
[GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> compile('"\\\n"(1 for c in I,\\\n\\', "","exec")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "", line 2
    "\
"(1 for c in I,\
\
      ^
SyntaxError: Generator expression must be parenthesized

while the one in this PR is correct but the offsets are badly computed:

Python 3.10.0 (default, Oct  4 2021, 22:36:16) [GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> compile('"\\\n"(1 for c in I,\\\n\\', "","exec")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "", line 2
    "(1 for c in I,\
     ^
SyntaxError: '(' was never closed
>>>

@ambv
Copy link
Contributor Author
ambv commented Oct 20, 2021

Maybe I am missing something but I am not getting that

Right, I read (2, 3) as (3, 2) in the output. So yes, definitely there's a regression here.

@pablogsal
Copy link
Member

I have been investigating and this is actually a bug in the tokenizer that was solved by the big tokenizer refactor I did for 3.10. Backporting that is going to be a pain and quite invasive unfortunately :(

Copy link
Member
@pablogsal pablogsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the complication, I am ok landing this as is.

@bedevere-bot
Copy link

@ambv: Please replace # with GH- in the commit message next time. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0