8000 Pluggable system for generating types from docstrings by chadrik · Pull Request #2240 · python/mypy · GitHub
[go: up one dir, main page]

Skip to content

Pluggable system for generating types from docstrings #2240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Rough draft of pluggable system for producing types from docstrings
  • Loading branch information
chadrik committed Nov 3, 2016
commit 0a47ddbd5130517ec726b6c4812d8f1bb16f0a95
27 changes: 27 additions & 0 deletions mypy/docstrings.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
from typing import List, Tuple, Dict
from mypy.types import Type, CallableType, AnyType
from mypy.nodes import Argument


def parse_docstring(docstring: str, line: int) -> Tuple[Dict[str, Type], Type]:
"""
Parse a docstring and return type representations. This function can
be overridden by third-party tools which aim to add typing via docstrings.

Returns a 2-tuple: dictionary of arg name to Type, and return Type.
Copy link
Member
@gvanrossum gvanrossum Oct 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if Type is the right thing to return here. Turning a string representing a type as found in the source code into a Type object is a fairly complicated process, and we already have code for that (the code that parses type comments). Maybe this should just return strings?

As a nit, I'd use the same convention for the return type as used in __annotations__ and a few of inspect's APIs -- just use a key "return" since that can't be an argument name anyway. (OTOH maybe the structure used here is more compatible with mypy's internal conventions, so I don't insist on this.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if Type is the right thing to return here. Turning a string representing a type as found in the source code into a Type object is a fairly complicated process, and we already have code for that (the code that parses type comments).

I'm using mypy.parsetype.parse_str_as_type to handle the conversion within my little experimental project, so it's pretty straight-forward. Returning strings means that mypy will have to have more of an opinion about how to deal with error handling and such, but it would avoid the need for the parse_docstring function to take a line number (which is there explicitly for passing to parse_str_as_type). I'll look into whether I can return Dict[str, str] while still maintaining the current functionality in my project.

As a nit, I'd use the same convention for the return type as used in annotations and a few of inspect's APIs -- just use a key "return" since that can't be an argument name anyway.

Yeah, I like that. It's a simpler structure to document.

"""
return None, None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the proper Optional to the return type (it's not enforced yet, but it will be in the future, and it's needed here).



def make_callable(args: List[Argument], type_map: Dict[str, Type],
ret_type: Type) -> CallableType:
if type_map is not None:
Copy link
Member
@gvanrossum gvanrossum Oct 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this function in this file? It seems to be used only by parse.py so it really belongs there, unless you also plan to monkey-patch this.

There's a missing Optional in the signature. But it almost feels like the caller should check whether type_map is None and not even call this.

(Also, why is this only needed by parser.py and not by fastparse*.py?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this function in this file?

I was trying to keep the docstring-related footprint in the parser modules to a minimum, and I thought it could also be useful to override, although in my personal experiments I haven't needed it.

why is this only needed by parse.py and not by fastparse*.py?

Conceptually these docstring-annotations are alternatives to comment-annotations, so I was able to minimize changes to code by using the same structures that would have delivered comment-annotations to instead deliver docstring-annotations. From the perspective of the calling code it's the same thing. In parse.py comment-annotations are expected to provide a "kind" for each argument (and there's a bit of code to ensure that the kinds are compatible with those parsed from the actual function signature) whereas fastparse always uses the kind from the function signature. I guess the authors of parse.py found it convenient to use an existing object -- CallableType -- to hold the extra data.

If we ditch parse.py support then it's a moot point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it's because you're emulating the result of parse_type_comment() which in turn calls parse_signature() which returns a Callable.

So here's another exhortation to return strings and let mypy parse the strings. Strings are what you have in your docstring and strings provide a much more stable API between an app (e.g. mypy) and an extension (your code for parsing docstrings) than Type objects. If we change the signature of Callable or we change the meaning of Type, your extension will break, and we can't make any promises that such internals won't change (not until we've had much more experience with extensions). But strings are pretty stable and you can always construct strings. So I'm saying you should just return a string that's got the same format as a type comment, i.e. # type: (blah, blah) -> blah (but without the # type: prefix).

arg_kinds = [arg.kind for arg in args]
arg_names = [arg.variable.name() for arg in args]
arg_types = [type_map.get(name) for name in arg_names]

return CallableType([a if a is not None else AnyType() for a in arg_types],
arg_kinds,
arg_names,
ret_type, None,
is_ellipsis_args=False)
12 changes: 11 additions & 1 deletion mypy/fastparse.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from functools import wraps
from inspect import cleandoc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer "import inspect" here. So the code that uses it is clear about the origin of the cleandoc() function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do. I noticed that nearly all functions in these modules were imported using from x import y so I was just mimicking the prevailing style, but glad to change it.

import sys

from typing import Tuple, Union, TypeVar, Callable, Sequence, Optional, Any, cast, List
Expand All @@ -22,6 +23,7 @@
)
from mypy import defaults
from mypy import experiments
from mypy import docstrings
from mypy.errors import Errors

try:
Expand Down Expand Up @@ -287,7 +289,15 @@ def do_func_def(self, n: Union[ast35.FunctionDef, ast35.AsyncFunctionDef],
else:
arg_types = [a.type_annotation for a in args]
return_type = TypeConverter(line=n.lineno).visit(n.returns)

# docstrings
if not any(arg_types) and return_type is None:
doc = ast35.get_docstring(n, clean=False)
if doc:
doc = cleandoc(doc.decode('unicode_escape'))
type_map, rtype = docstrings.parse_docstring(doc, n.lineno)
if type_map is not None:
arg_types = [type_map.get(name) for name in arg_names]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd assert that this has the right length, otherwise you may get crazy crashes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The length of arg_types comes indirectly, via arg_names, from the from the ast-parsed args so it should always be correct:

        args = self.transform_args(n.args, n.lineno)
        arg_kinds = [arg.kind for arg in args]
        arg_names = [arg.variable.name() for arg in args]

Although this reminds me that I did want to have something that checks that all types in the type map were used exactly once.

return_type = rtype
for arg, arg_type in zip(args, arg_types):
self.set_type_optional(arg_type, arg.initializer)

Expand Down
12 changes: 11 additions & 1 deletion mypy/fastparse2.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
two in a typesafe way.
"""
from functools import wraps
from inspect import cleandoc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer used.

import sys

from typing import Tuple, Union, TypeVar, Callable, Sequence, Optional, Any, cast, List
Expand All @@ -39,6 +40,7 @@
from mypy import experiments
from mypy.errors import Errors
from mypy.fastparse import TypeConverter, TypeCommentParseError
from mypy import docstrings

try:
from typed_ast import ast27
Expand Down Expand Up @@ -293,7 +295,15 @@ def visit_FunctionDef(self, n: ast27.FunctionDef) -> Statement:
else:
arg_types = [a.type_annotation for a in args]
return_type = converter.visit(None)

# docstrings
if not any(arg_types) and return_type is None:
doc = ast27.get_docstring(n, clean=False)
if doc:
doc = cleandoc(doc.decode('unicode_escape'))
type_map, rtype = docstrings.parse_docstring(doc, n.lineno)
if type_map is not None:
arg_types = [type_map.get(name) for name in arg_names]
return_type = rtype
for arg, arg_type in zip(args, arg_types):
self.set_type_optional(arg_type, arg.initializer)

Expand Down
13 changes: 11 additions & 2 deletions mypy/parse.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
"""

import re
from inspect import cleandoc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version of the parser is doomed, we'll remove it as soon as we can use the fast parser on all platforms. Maybe you could skip changing this file and just state that your feature depends on --fast-parser?


from typing import List, Tuple, Set, cast, Union, Optional

Expand Down Expand Up @@ -38,6 +39,7 @@
from mypy.options import Options

from mypy import experiments
from mypy import docstrings


class ParseError(Exception): pass
Expand Down Expand Up @@ -417,7 +419,7 @@ def parse_function(self, no_type_checks: bool = False) -> FuncDef:
arg_kinds = [arg.kind for arg in args]
arg_names = [arg.variable.name() for arg in args]

body, comment_type = self.parse_block(allow_type=True)
body, comment_type = self.parse_block(args=args)
# Potentially insert extra assignment statements to the beginning of the
# body, used to decompose Python 2 tuple arguments.
body.body[:0] = extra_stmts
Expand Down Expand Up @@ -831,7 +833,7 @@ def construct_function_type(self, args: List[Argument], ret_type: Type,

# Parsing statements

def parse_block(self, allow_type: bool = False) -> Tuple[Block, Type]:
def parse_block(self, args: List[Argument]=None) -> Tuple[Block, Type]:
colon = self.expect(':')
if not isinstance(self.current(), Break):
# Block immediately after ':'.
Expand All @@ -854,6 +856,13 @@ def parse_block(self, allow_type: bool = False) -> Tuple[Block, Type]:
brk = self.expect_break()
type = self.parse_type_comment(brk, signature=True)
self.expect_indent()
if args is not None:
cur = self.current()
if type is None and isinstance(cur, StrLit):
type_map, ret_type = docstrings.parse_docstring(
cleandoc(cur.parsed()), cur.line)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that you call cleandoc() in all parse_docstring() calls. So move that into parse_docstring().

type = docstrings.make_callable(args, type_map, ret_type)

stmt_list = [] # type: List[Statement]
while (not isinstance(self.current(), Dedent) and
not isinstance(self.current(), Eof)):
Expand Down
0