bpo-40334: Support Python 3.6 in the PEG generator #19786

serhiy-storchaka · 2020-04-29T07:22:33Z

https://bugs.python.org/issue40334

pablogsal · 2020-04-29T09:23:28Z

Thanks @serhiy-storchaka for helping with this :)

Two things first:

Something is odd, there should be no changes to the c generated files: Parser/pegen/parse.c has changed so something looks incorrect.
(Edit: just saw that is regenerated) ~~You need to regenerate the metaparser: run regen-metaparser in the Tools/peg_generator repo. Maybe we should add this target to the regen-all target as well.~~

serhiy-storchaka · 2020-04-29T13:25:24Z

The problem was in the subtle difference in the comma interpretation between

(a := b,)

and

a = b,

serhiy-storchaka · 2020-04-29T16:08:53Z

Changes:

Get rid of :=.
Fix forward references in annotations.
import EXACT_TOKEN_TYPES from tokenize instead of token.

gvanrossum · 2020-04-29T19:20:53Z

The grammar parser is generated from metagrammar. I didn't read the whole diff -- do you also fix the Python generator not to generate walruses? How important is it to be able to bootstrap with Python < 3.8? I rather like the way the generator generates walruses...

pablogsal · 2020-04-29T19:24:16Z

Would it be possible to avoid the long indent blocks to substitute the walrus? They will scale with the number of alternatives in a rule and although is generated code, I find that a bit unpleasant as it makes the generated code much harder to read and reason about IMHO :(

serhiy-storchaka · 2020-04-29T20:07:57Z

I think it is important to not depend on the latest version of Python. This adds obstacles for new contributors. In the past all scripts used for building Python was conservative enough. I think that we should support at least the oldest Python version for which regular bugfixes are accepted (currently it is 3.6). In addition, this can help to catch some bugs (as using token data from a wrong Python version). This PR allows to use 3.6, but it fails on 3.7, -- therefore there are some bugs in the generator.

The generated code with walruses is not so easy to read. It is subjective, but it uses some tricks (as in (opt := self.memoflag(),), note the comma) which I afraid are hard to notice. This PR does minimal changes, but it reduces indentation in some cases (for cut = True and for the trick mentioned above). It is possible to optimize the generated code even more (eliminate unused variables, collapse some ifs, merge duplicated code), but I prefer to do this in separate PR.

pablogsal · 2020-04-29T20:29:48Z

This PR allows to use 3.6, but it fails on 3.7

Oh, interesting! What is the failure in 3.7? Something to do with the ASYNC tokens?

. It is possible to optimize the generated code even more (eliminate unused variables, collapse some ifs, merge duplicated code), but I prefer to do this in separate PR.

I still would like to not have one indentation level per alternative in this PR. I am refering to this pattern in the generated code:

 if rulename:
            opt = self.memoflag()
            literal = self.expect(":")
            if literal:
                alts = self.alts()
                if alts:
                    newline = self.expect('NEWLINE')
                    if newline:
                        indent = self.expect('INDENT')
                        if indent:
                            more_alts = self.more_alts()
                            if more_alts:
                                dedent = self.expect('DEDENT')
                                if dedent:
                                    return Rule ( rulename [ 0 ] , rulename [ 1 ] , Rhs ( alts . alts + more_alts . alts ) , memo = opt )

gvanrossum · 2020-04-29T20:38:21Z

Honestly this is horrible. Please withdraw this.

New contributors won't need to regenerate the parser unless they're hacking on the grammar, in which case I think requiring 3.8 is not so bad.

serhiy-storchaka · 2020-04-29T21:36:41Z

What is the failure in 3.7? Something to do with the ASYNC tokens?

Yes, it is. You can try yourself with this PR.

pegen.grammar.GrammarError: Dangling reference to rule 'ASYNC'

I am refering to this pattern in the generated code:

It is perhaps an example with the largest nesting level. We can get rid from three levels. And after using PEP 8 style for generated code it would look better:

        if rulename:
            opt = self.memoflag()
            if self.expect(":")
                alts = self.alts()
                if alts and self.expect('NEWLINE') and self.expect('INDENT'):
                    more_alts = self.more_alts()
                    if more_alts and self.expect('DEDENT'):
                        return Rule(rulename[0], rulename[1], Rhs(alts.alts + more_alts.alts), memo=opt)

The current code is not an example of beauty either:

        if (
            (rulename := self.rulename())
            and
            (opt := self.memoflag(),)
            and
            (literal := self.expect(":"))
            and
            (alts := self.alts())
            and
            (newline := self.expect('NEWLINE'))
            and
            (indent := self.expect('INDENT'))
            and
            (more_alts := self.more_alts())
            and
            (dedent := self.expect('DEDENT'))
        ):
            return Rule ( rulename [ 0 ] , rulename [ 1 ] , Rhs ( alts . alts + more_alts . alts ) , memo = opt )

Further we can merge common code and generate more optimal and compact code for the function (https://github.com/python/cpython/pull/19786/files#diff-97f24c073d6407aacd17575fa57da559L169-L225). 21 lines instead of 56!

        mark = self.mark()
        rulename = self.rulename()
        if rulename:
            opt = self.memoflag()
            if self.expect(":")
                mark2 = self.mark()
                alts = self.alts()
                if alts and self.expect('NEWLINE') and self.expect('INDENT'):
                    more_alts = self.more_alts()
                    if more_alts and self.expect('DEDENT'):
                        return Rule(rulename[0], rulename[1], Rhs(alts.alts + more_alts.alts), memo=opt)
                self.reset(mark2)
                if self.expect('NEWLINE') and self.expect('INDENT'):
                    more_alts = self.more_alts()
                    if more_alts and self.expect('DEDENT'):
                        return Rule(rulename[0], rulename[1], more_alts, memo=opt)
                self.reset(mark2)
                alts = self.alts()
                if alts and self.expect('NEWLINE'):
                    return Rule(rulename[0], rulename[1], alts, memo=opt)
        self.reset(mark)

But we need to use nesting for this. It is not possible to write efficient and readable code using only one level of nesting.

New contributors need to regenerate some code when they fix docstring in a function implemented with Argument Clinic. And it is easier to run make regen-all.

The purposes of this PR:

Make easier to build Python (including for new contributors).
More diverse testing of the grammar generator.

If you think it's not worth it, I'll withdraw this PR.

gvanrossum · 2020-04-29T22:27:29Z

IIUC there was a buildbot using 3.6? Let's not worry about new contributors until we see complaints.

serhiy-storchaka · 2020-04-30T07:00:02Z

I do not know what relation buildbots using 3.6 have to this issue. Are they failing with the PEG generator?

I am not a new contributor, but it touched me when I was not able to build 3.9.

gvanrossum · 2020-04-30T18:39:24Z

I'm not sure myself, I was speculating about the buildbots (IIRC there was a buildbot that had a failure due to having 3.6 installed, but maybe it was a different issue).

Anyway, I still really dislike this solution. Surely you can manage to install 3.8 on your machine. I'll close it now to spare us the pain of extended discussion.

serhiy-storchaka added 4 commits April 27, 2020 20:55

Support Python 3.7 in the PEG generator

e2fd9de

Merge branch 'master' into wip-peg_generator-3.7

592f0e5

Fix testutil.py.

3d09d69

Support Python 3.6 in the PEG generator

d8d163c

serhiy-storchaka added the type-feature A feature request or enhancement label Apr 29, 2020

serhiy-storchaka requested a review from pablogsal April 29, 2020 07:22

the-knights-who-say-ni added the CLA signed label Apr 29, 2020

bedevere-bot added the awaiting core review label Apr 29, 2020

lysnikolaou mentioned this pull request Apr 29, 2020

bpo-40334: refactor and cleanup for the PEG generators #19775

Merged

serhiy-storchaka added 2 commits April 29, 2020 14:03

Merge branch 'master' into peg_generator-3.6

dbb859a

Fix handling of optional items.

824bd21

Minor fixes.

e9a8ef4

pablogsal requested review from lysnikolaou and gvanrossum April 29, 2020 19:12

pablogsal added the skip news label Apr 29, 2020

gvanrossum closed this Apr 30, 2020

gvanrossum mentioned this pull request May 9, 2020

[WIP] Python 3.9 parser python/typed_ast#137

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bpo-40334: Support Python 3.6 in the PEG generator #19786

bpo-40334: Support Python 3.6 in the PEG generator #19786

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bpo-40334: Support Python 3.6 in the PEG generator #19786

bpo-40334: Support Python 3.6 in the PEG generator #19786

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!