8000 bpo-40334: Support Python 3.6 in the PEG generator by serhiy-storchaka · Pull Request #19786 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

bpo-40334: Support Python 3.6 in the PEG generator #19786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

serhiy-storchaka
Copy link
Member
@serhiy-storchaka serhiy-storchaka commented Apr 29, 2020

@pablogsal
Copy link
Member
pablogsal commented Apr 29, 2020

Thanks @serhiy-storchaka for helping with this :)

Two things first:

  • Something is odd, there should be no changes to the c generated files: Parser/pegen/parse.c has changed so something looks incorrect.
  • (Edit: just saw that is regenerated) You need to regenerate the metaparser: run regen-metaparser in the Tools/peg_generator repo. Maybe we should add this target to the regen-all target as well.

@serhiy-storchaka
Copy link
Member Author

The problem was in the subtle difference in the comma interpretation between

(a := b,)

and

a = b,

@serhiy-storchaka
Copy link
Member Author

Changes:

  • Get rid of :=.
  • Fix forward references in annotations.
  • import EXACT_TOKEN_TYPES from tokenize instead of token.

@gvanrossum
Copy link
Member

The grammar parser is generated from metagrammar. I didn't read the whole diff -- do you also fix the Python generator not to generate walruses? How important is it to be able to bootstrap with Python < 3.8? I rather like the way the generator generates walruses...

@pablogsal
Copy link
Member

Would it be possible to avoid the long indent blocks to substitute the walrus? They will scale with the number of alternatives in a rule and although is generated code, I find that a bit unpleasant as it makes the generated code much harder to read and reason about IMHO :(

@serhiy-storchaka
Copy link
Member Author

I think it is important to not depend on the latest version of Python. This adds obstacles for new contributors. In the past all scripts used for building Python was conservative enough. I think that we should support at least the oldest Python version for which regular bugfixes are accepted (currently it is 3.6). In addition, this can help to catch some bugs (as using token data from a wrong Python version). This PR allows to use 3.6, but it fails on 3.7, -- therefore there are some bugs in the generator.

The generated code with walruses is not so easy to read. It is subjective, but it uses some tricks (as in (opt := self.memoflag(),), note the comma) which I afraid are hard to notice. This PR does minimal changes, but it reduces indentation in some cases (for cut = True and for the trick mentioned above). It is possible to optimize the generated code even more (eliminate unused variables, collapse some ifs, merge duplicated code), but I prefer to do this in separate PR.

@pablogsal
Copy link
Member
pablogsal commented Apr 29, 2020

This PR allows to use 3.6, but it fails on 3.7

Oh, interesting! What is the failure in 3.7? Something to do with the ASYNC tokens?

. It is possible to optimize the generated code even more (eliminate unused variables, collapse some ifs, merge duplicated code), but I prefer to do this in separate PR.

I still would like to not have one indentation level per alternative in this PR. I am refering to this pattern in the generated code:

 if rulename:
            opt = self.memoflag()
            literal = self.expect(":")
            if literal:
                alts = self.alts()
                if alts:
                    newline = self.expect('NEWLINE')
                    if newline:
                        indent = self.expect('INDENT')
                        if indent:
                            more_alts = self.more_alts()
                            if more_alts:
                                dedent = self.expect('DEDENT')
                                if dedent:
                                    return Rule ( rulename [ 0 ] , rulename [ 1 ] , Rhs ( alts . alts + more_alts . alts ) , memo = opt )

@gvanrossum
Copy link
Member

Honestly this is horrible. Please withdraw this.

New contributors won't need to regenerate the parser unless they're hacking on the grammar, in which case I think requiring 3.8 is not so bad.

@serhiy-storchaka
Copy link
Member Author

What is the failure in 3.7? Something to do with the ASYNC tokens?

Yes, it is. You can try yourself with this PR.

pegen.grammar.GrammarError: Dangling reference to rule 'ASYNC'

I am refering to this pattern in the generated code:

It is perhaps an example with the largest nesting level. We can get rid from three levels. And after using PEP 8 style for generated code it would look better:

        if rulename:
            opt = self.memoflag()
            if self.expect(":")
                alts = self.alts()
                if alts and self.expect('NEWLINE') and self.expect('INDENT'):
                    more_alts = self.more_alts()
                    if more_alts and self.expect('DEDENT'):
                        return Rule(rulename[0], rulename[1], Rhs(alts.alts + more_alts.alts), memo=opt)

The current code is not an example of beauty either:

        if (
            (rulename := self.rulename())
            and
            (opt := self.memoflag(),)
            and
            (literal := self.expect(":"))
            and
            (alts := self.alts())
            and
            (newline := self.expect('NEWLINE'))
            and
            (indent := self.expect('INDENT'))
            and
            (more_alts := self.more_alts())
            and
            (dedent := self.expect('DEDENT'))
        ):
            return Rule ( rulename [ 0 ] , rulename [ 1 ] , Rhs ( alts . alts + more_alts . alts ) , memo = opt )

Further we can merge common code and generate more optimal and compact code for the function (https://github.com/python/cpython/pull/19786/files#diff-97f24c073d6407aacd17575fa57da559L169-L225). 21 lines instead of 56!

        mark = self.mark()
        rulename = self.rulename()
        if rulename:
            opt = self.memoflag()
            if self.expect(":")
                mark2 = self.mark()
                alts = self.alts()
                if alts and self.expect('NEWLINE') and self.expect('INDENT'):
                    more_alts = self.more_alts()
                    if more_alts and self.expect('DEDENT'):
                        return Rule(rulename[0], rulename[1], Rhs(alts.alts + more_alts.alts), memo=opt)
                self.reset(mark2)
                if self.expect('NEWLINE') and self.expect('INDENT'):
                    more_alts = self.more_alts()
                    if more_alts and self.expect('DEDENT'):
                        return Rule(rulename[0], rulename[1], more_alts, memo=opt)
                self.reset(mark2)
                alts = self.alts()
                if alts and self.expect('NEWLINE'):
                    return Rule(rulename[0], rulename[1], alts, memo=opt)
        self.reset(mark)

But we need to use nesting for this. It is not possible to write efficient and readable code using only one level of nesting.

New contributors need to regenerate some code when they fix docstring in a function implemented with Argument Clinic. And it is easier to run make regen-all.

The purposes of this PR:

  • Make easier to build Python (including for new contributors).
  • More diverse testing of the grammar generator.

If you think it's not worth it, I'll withdraw this PR.

@gvanrossum
Copy link
Member

IIUC there was a buildbot using 3.6? Let's not worry about new contributors until we see complaints.

@serhiy-storchaka
Copy link
Member Author

I do not know what relation buildbots using 3.6 have to this issue. Are they failing with the PEG generator?

I am not a new contributor, but it touched me when I was not able to build 3.9.

@gvanrossum
Copy link
Member

I'm not sure myself, I was speculating about the buildbots (IIRC there was a buildbot that had a failure due to having 3.6 installed, but maybe it was a different issue).

Anyway, I still really dislike this solution. Surely you can manage to install 3.8 on your machine. I'll close it now to spare us the pain of extended discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0