10000 String annotations [PEP 563] by gvanrossum · Pull Request #4390 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

String annotations [PEP 563] #4390

8000
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jan 26, 2018
Merged

String annotations [PEP 563] #4390

merged 7 commits into from
Jan 26, 2018

Conversation

gvanrossum
Copy link
Member
@gvanrossum gvanrossum commented Nov 14, 2017

This is unfinished work by @ambv. UPDATE: this is ready for review now.

I'm adding it here because patching and reviewing are easier (for me anyway) when it's in PR form. Also, @serhiy-storchaka your eye would be appreciated, esp. for the hairy AST unparsing code in C. (Also, if you had to do this from scratch, would it be easier to unparse the CST instead?)

@gvanrossum
Copy link
Member Author

I'm guessing there are still crasher bugs in here... E.g.

>>> from __future__ import string_annotations
>>> def f(a: list[str]): pass
... 
Segmentation fault: 11

Copy link
Member
@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are crashes because the work is unfinished. Some parts still are not implemented (in particularly subscribing). Error checking is minimal if exists.

All concatenation has quadratic time. I think it is worth to implement simple accumulator that uses overallocated array and makes concatenations for linear time. Or you can reuse existing _PyBytesWriter or _PyUnicodeWriter.

Yes, maybe unparsing the CST could be much simpler. But we don't have feature flags at this stage.

Python/compile.c Outdated
ann_as_str = PyUnicode_InternFromString(ann_as_charp);
if (!ann_as_str)
return 0;
ADDOP_O(c, LOAD_CONST, ann_as_str, consts);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ann_as_str is leaked. Use ADDOP_N.

Python/compile.c Outdated
char *ann_as_charp;
static PyObject *ann_as_str;

ann_as_charp = PyAST_StrFromAstExpr(annotation, 1, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if not NULL.

Python/ast.c Outdated
wrap_parens(char * str)
{
char *result;
result = PyMem_RawMalloc(strlen(str) + 3);
Copy link
Member
@serhiy-storchaka serhiy-storchaka Nov 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Results of all allocations should be checked for NULL.

Python/ast.c Outdated
return str_for_ast_list(e);
case Tuple_kind:
return str_for_ast_tuple(e, omit_parens);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the default case. At least for debugging.

@serhiy-storchaka
Copy link
Member

If unparse the AST I would move the code into a separate file. There will be a lot of code, comparable with the size of compile.c.

@ambv
Copy link
Contributor
ambv commented Nov 18, 2017

Alright, I'm going to:

  • change the future name back to annotations (yay!)
  • stick to AST unparsing
  • move this work to a separate file
  • change concatenation to use the accumulator pattern (reusing _PyBytesWriter would be great)
  • finish the functionality

8000
@gvanrossum
Copy link
Member Author

Sounds great! I am pretty happy to see where this is going, I'd like to see it in a solid state by the time beta 1 comes around.

@ambv ambv force-pushed the string_annotations branch from b49dfa1 to c56d5bd Compare November 18, 2017 22:48
@ambv ambv changed the title [WIP] String annotations String annotations [PEP 563] Nov 18, 2017
@ambv ambv self-assigned this Nov 18, 2017
@ambv
Copy link
Contributor
ambv commented Nov 18, 2017

All comments from Serhiy's review acted upon, the import renamed back to "annotations" like commented above. The only missing piece in the implementation is f-string support but my battery will die soon so I wanted to push this out for you to look at.

Shouldn't segfault anymore, trying to use f-strings raises an exception instead.

@ambv ambv force-pushed the string_annotations branch from c56d5bd to 1e71626 Compare November 18, 2017 22:57
@ambv
Copy link
Contributor
ambv commented Nov 18, 2017

Hm, the clang Travis CI build is failing due to invalid whitespace. make patchcheck is complaining about Include/code.h, suggesting the following diff:
https://gist.github.com/ambv/9f0874d56c7cf0d5ce98dfef38de0ce6

This diff is suggesting a lot of changes but not on the single line that I modified ¯\_(ツ)_/¯

@ambv
Copy link
Contributor
ambv commented Nov 18, 2017

I added a commit that fixes the whitespace according to the generated patch above so that I can see Travis CI passing. We can decide what to do with it later.

AppVeyor is failing because we need to modify PCbuild/* but I don't have access to a Windows box.

NEWS entry is not there yet since Blurb is tied to BPO issues and I'm wondering whether creating dummy issues for PEP work isn't redundant? I asked @larryhastings, we can also deal with this later.

@gvanrossum gvanrossum requested a review from a team as a code owner November 18, 2017 23:28
@ambv ambv added the skip news label Nov 18, 2017
@ambv ambv force-pushed the string_annotations branch from 4cdb25d to 98231af Compare November 20, 2017 19:39
@ambv ambv added skip news and removed skip news labels Nov 20, 2017
@ambv
Copy link
Contributor
ambv commented Nov 20, 2017

Changes:

  • Rebased on top of latest master
  • Added a NEWS entry (manually since this is a PEP; the bot doesn't recognize this, therefore the skip news label has to stay)
  • Moved the changes to PCbuild to the respective implementation commit ("Implement unparsing...")

This is ready for another review pass, @serhiy-storchaka. The only bit left is f-strings which is going to be a bit tedious so I'm waiting with it after a new round of feedback :-)

@ilevkivskyi
Copy link
Member

How, do we organize the updates to typing.get_type_hints() to work with "doubly quoted" strings? I mean this should still work with the __future__ import:

def f() -> List['int']:
    ...

assert get_type_hints(f)['return'] == List[int]

I suppose this can be part of the same PR, since this is not necessary in the backported version on PyPI. Also my updates to typing following PEP 560 should not have merge conflicts.

@ambv
Copy link
Contributor
ambv commented Nov 22, 2017

@ilevkivskyi, I want to fix get_type_hints() separately since there's really no reason why accepting "List['int']" shouldn't be supported even today. Same with fixing the self-class reference as you suggested on python-dev (I think I'd do it with a ChainMap though), fixing the forward ref cache conflicts, etc.

@ilevkivskyi
Copy link
Member

@ambv OK, I am fine with this as well.

@serhiy-storchaka
Copy link
Member

It will take a time for making a review of such large change. But one comment I can say now.

The unparser adds parenthesis for grouping subexpression. They are added even if not strictly needed, e.g. in a + (b * (c ** d)). The problem is not that redundant parenthesis makes an expression less readable. The problem is that they increase the stack consumption when parse the expression again. It is possible that the original expression can be parsed, but parsing the unparsed expression will fail or even crash.

I already encountered with similar problem when worked on the parser of plural form expressions in gettext.py. A C-like syntax is parsed and converted to Python syntax, and the result is evaluated. I minimized the use of parenthesis. If the subexpression operator has higher priority than the operator of the outer expression, parenthesis are not added.

This is not a blocker, and we can solve this problem later, but you can think about this while I'm making a review.

@gvanrossum
Copy link
Member Author
gvanrossum commented Nov 22, 2017 via email

@ambv
Copy link
Contributor
ambv commented Nov 23, 2017 via email

@serhiy-storchaka
Copy link
Member

Yes, in case of gettext the one of purposes was to guard against a malicious input. In the case of annotations it is less likely. The other purpose -- speeding up the compilation.

This problem can be solved by assigning the numerical priority level to expressions and omitting parenthesis only if the current priority level is higher then the level of a super-expression (the sub-expression rather of a super-expression is responsible for adding parenthesis).

I have yet two questions.

  1. Is the performance important here (I afraid yes)? The Python implementation would be simpler and more reliable. But much slower.

  2. Do we need to support the full Python expression syntax? In particular arithmetic operations. Or it would be enough to support only the small subset used in type annotations? Names, attributes, indexing, tuples, what more?

@gvanrossum
Copy link
Member Author
gvanrossum commented Nov 23, 2017 via email

Copy link
Member
@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments are mostly style comments (PEP 7) and suggestions for cleaning up the code, but there are several errors.