8000 Add type stub generation to cython by ax487 · Pull Request #3818 · cython/cython · GitHub
[go: up one dir, main page]

Skip to content

Add type stub generation to cython #3818

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

ax487
Copy link
Contributor
@ax487 ax487 commented Sep 7, 2020

This was brought up in #3150 by @scoder:

If generating .pyi stubs is not too complex, Cython could do it automatically, yes, similar to how it now generates the embedded signatures in AutoDocTransform. PR welcome.

I have written a very simple prototype to generate stub files, based on a TreeVisitor. Based on this input:

#cython: language_level=3

import typing

import typing as tipi

from typing import List

import package.subpackage

import test.fancy

from .other import Other
from ..other import Test

cdef class TypingTestClass:

    cdef should_not_appear(self, int x):
        pass

    cpdef int cpdef_typed(self, int l, int u, int s = 4):
        return 0

    cpdef cpdef_untyped(self, l):
        pass

    def func_test(self, i: int) -> int:
        return 0

    def func_test_pos(self, *pos):
        return 0

    def func_test_kwds(self, **kwargs):
        return 0

    def func_test_pos_kwds(self, *pos, **kwargs):
        return 0

    def func_with_default(self, value=1) -> int:
        return 0

cdef class Argument:
    pass

cdef class ArgumentUser:
    cpdef func(self, Argument argument):
        pass

def global_pfunc(i: int) -> int:
    return 1

cpdef str global_cpfunc(str i):
    return "asd"

class Outer:

    def outer_method(self, i: int):
        pass

    class Inner:
        def inner_method(self, j):
            return 1

    def __dealloc__(self):
        print("asd")

    def __cinit__(self):
        print("cinit")


class TypingClass:

    def method(self, v: typing.Dict[str, str]) -> typing.List[int]:
        return []

    def other_method(self, v: typing.Tuple[str, int, List[int]]):
        pass


global_typed: int

global_untyped = 0

it generates the following stub file:

# Python stub file generated by Cython 3.0a6

import typing
import typing as tipi
from typing import List
import package.subpackage
import test.fancy
from .other import Other
from ..other import Test
class TypingTestClass:
  def cpdef_typed(self, l: int, u: int, s: int = ...) -> int: ...
  def cpdef_untyped(self, l: object) -> object: ...
  def func_test(self, i: int) -> int: ...
  def func_test_pos(self, *pos): ...
  def func_test_kwds(self, **kwargs): ...
  def func_test_pos_kwds(self, *pos, **kwargs): ...
  def func_with_default(self, value = ...) -> int: ...
  def __reduce_cython__(self): ...
  def __setstate_cython__(self, __pyx_state): ...

class Argument:
  def __reduce_cython__(self): ...
  def __setstate_cython__(self, __pyx_state): ...

class ArgumentUser:
  def func(self, argument: Argument) -> Argument: ...
  def __reduce_cython__(self): ...
  def __setstate_cython__(self, __pyx_state): ...

def global_pfunc(i: int) -> int: ...
def global_cpfunc(i: unicode) -> unicode: ...
class Outer
  def outer_method(self, i: int): ...
  class Inner
    def inner_method(self, j): ...

  def __dealloc__(self): ...
  def __cinit__(self): ...

class TypingClass
  def method(self, v: typing.Dict[str, str]) -> typing.List[int]: ...
  def other_method(self, v: typing.Tuple[str, int, List[int]]): ...

def __pyx_unpickle_TypingTestClass(__pyx_type, __pyx_checksum, __pyx_state): ...
def __pyx_unpickle_Argument(__pyx_type, __pyx_checksum, __pyx_state): ...
def __pyx_unpickle_ArgumentUser(__pyx_type, __pyx_checksum, __pyx_state): ...

The following problems need to be resolved:

  • Determining types for cpdef correctly: I am using a combination of the py_type_name of the PyrexType and the type.name of the node method so far. Unfortunately, for an int, the py_type_name returns '(int, long)' see here. What is more, an object type is probably a little useless...
  • Add decorator support, in particular @property
    • Find out if we need more general support, in particular for callable decorators. I have yet to see any stub file containing one.
  • The right place in the pipeline is not really clear: So far I have added it at the end. But as seen above, cython adds some methods to the classes and the global scope. I don't think those should be part of the stub. Same for __cinit__ and friends.
  • Support typed global variables: Declarations such as global_typed: int without a value are swallowed by the parser, this probably needs to be rectified.
    • Support global python variables
    • Support global cpdef variables
  • Retain type variables (of the form T = TypeVar("T"))
  • Maybe add a directive to enable / disable generation of type stubs?!
  • Actually write type stubs to .pyi file accompanying the compiled extension library (rather than just printing it out)
  • Add some test code

I will continue to work on the PR, tracking the changes. I would be happy about any feedback...

@scoder
Copy link
Contributor
scoder commented Sep 8, 2020

Ah, cool, thanks for volunteering to work on this.
However, it seems that you didn't see the Cython.CodeWriter module. It already generates Cython code from a tree, and would be a much better basis for this than a plain TreeVisitor. The DeclarationWriter seems the best starting point. Maybe you can extract some more functionality that you need from its subclasses into reusable mixin classes, e.g. the import handling of the StatementWriter.

@scoder
Copy link
Contributor
scoder commented Sep 8, 2020

The right place in the pipeline is not really clear

I was considering to run it right after type analysis, but I now think that it would really be best to run it at the very end. Multiple code checks are executed shortly before the C code generation phase, and I don't know if users would expect a stub file to be generated (or overwritten) if the translation run fails in the end. We might even consider doing it after the C code generation, because that is the point where we know that everything really went smoothly.

cython adds some methods to the classes and the global scope. I don't think those should be part of the stub. Same for __cinit__ and friends.

Those are added somewhat early in the pipeline, so you wouldn't find a place to run the .pyi generation before them. Rather, exclude everything that starts with a double underscore. Or only __pyx_…, if you want.

Declarations such as global_typed: int without a value are swallowed by the parser, this probably needs to be rectified.

Not sure if it's really the parser. Probably some later transform, such as OptimizeBuiltinCalls. Look visit_ExprStatNode, in case there are other places. It might be ok to keep only those variable references that carry annotations.

Retain type variables (of the form T = TypeVar("T"))

Oh, yes. Tricky one. You might start by adding only assignments to the visitor and tracking names imported from the typing module. We already do that for the cython module in the InterpretCompilerDirectives transform. Might be a good time to extract that functionality into a separate transform (or mixin class) because we'd also generally need it for tracking usages of the typing module at some point (which is exactly your use case here).

Maybe add a directive to enable / disable generation of type stubs?!

Sounds more like an option to me, but should be handled in the same way as annotate, i.e. an option that is internally treated (and usable) as a directive.

Add some test code

Sure. You can base that on the CodeWriter tests as well, see Cython.Tests.TestCodeWriter.

@da-woods
Copy link
Contributor
da-woods commented Sep 8, 2020

Determining types for cpdef correctly: I am using a combination of the py_type_name of the PyrexType and the type.name of the node method so far. Unfortunately, for an int, the py_type_name returns '(int, long)' see here. What is more, an object type is probably a little useless...

I ended up having to solve a very similar problem for my dataclasses PR. My approach was GetTypeNode. It's currently unreviewed (so may have issues) and also may not match exactly what you want, but it might be worth a look seeing if you can borrow that code?

@ax487
Copy link
Contributor Author
ax487 commented Sep 8, 2020

Thank you for the feedback and support. The first draft was more of a prototype to convey the ideas behind the stub generator. I am glad you agree with the based architecture, so I will just go on until I have a working prototype 😄

@scoder
Copy link
Contributor
scoder commented Sep 9, 2020

agree with the based architecture

Well, most of this should still reuse the existing DeclarationWriter, or a common base class. Just reusing the indentation part still leads to a lot of code duplication that can be avoided. Stub files aren't inherently different from .pxd files. Improvements for one should ideally benefit the other as well.

@ax487
Copy link
Contributor Author
ax487 commented Sep 9, 2020

Well, there is definitely some overlap between writing pxd and pxi files, but there are also some differences. For example, the declaration of a cdef class corresponds to an ordinary class in the stub file. Also the number of node types visited by both DeclarationWriter and the TypeStubGenerator is fairly small. But I will try my best to avoid duplication nonetheless.

@da-woods: Thank you for the suggestions. Regarding tests, I think I will try to rig something up using round trips via the python ast...

"__releasebuffer__",
"__getreadbuffer__",
"__getwritebuffer__",
"___getsegcount__",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

___ – typo or intended?

@scoder
Copy link
Contributor
scoder commented Sep 10, 2020

the number of node types visited by both DeclarationWriter and the TypeStubGenerator is fairly small.

Since the CodeWriter module has a pretty complete Python code generator, the total overlap is probably quite large, in fact. I think most of the differences can be handled by adding suitable small hook methods that the stub file writer can overwrite when it needs to generate slightly different output (ellipses etc.).

@tobiasdiez
Copy link

What's the status of this PR? Automatically generating typing stubs would be a very nice feature!

@scoder scoder added feature Python Semantics General Python (3) language semantics labels Nov 29, 2021
@scoder
Copy link
Contributor
scoder commented Nov 29, 2021

I think the original author stopped working on it. Help is welcome.

Look through my previous comments to see what needs to be changed.

@Vizonex
Copy link
Vizonex commented Jun 24, 2023

@scoder I like this change a lot and I think I could resume this branch and fix that typo if you want me to even though it's been almost two full years since the author abandoned the project so some other fixes that might be needed such as variable name changes and other things. My reasons on finishing this are simple. I'm a VS Code user, I found it very annoying to have to write large amounts of stub files for things. I think that this will be beneficial to projects such as PyImgui and Pyducktape2 which lack stub files. Having these tools available may help those projects out tremendously. But hey, this isn't my first rodeo making a large solution to a problem and it won't be my last either.

@scoder
Copy link
Contributor
scoder commented Jun 24, 2023 via email

@Vizonex
Copy link
Vizonex commented Jun 24, 2023

I agree with that as well. having conversions from this example I wrote below is what I've always wanted.

# myexample.pyx
cpdef int add(int a, int b): 
    return a + b 
# myexample.pyi
def add(a: int , b: int ) -> int:..

I also just took a look at Shadow.pyi as well and I think it would be a good idea to use those variables to help with type-hinting simillar to how mypy imports things from typing...

My only question is this, should I try and make a new fork off of the old branch or make an entirely new fork and bring in some of the things this author worked on as a template?

@RaubCamaioni
Copy link

Compilation of PYX (cython) -> PYI (stub) file converters.

  • CyStub: uses cython compiler to generate PYX -> PYI
  • cythonbuilder: custom parser PYX -> PYI
  • CythonPeg: pyparsing custom definitions PYX -> PYI
  • stubgen: uses python runtime objects to generate PYI files

I should have done a more complete search earlier. I would have considered using the cython compiler as the parser.
I started a conversation in cython-users mailing list to compile additional converter options (if it gets approved).
If you know about additional PYX -> PYI code/methods post it there.

@Vizonex
Copy link
Vizonex commented Aug 7, 2023 via email

@moi90
Copy link
moi90 commented Sep 3, 2023

Just from glancing at the linked pages, it seems that CyStub by @Vizonex is the most complete solution at the moment, right? However, it does not work anymore with Cython 3...

Stubgen is a mostly stable solution, but is unable to infer most types from Cython code. (They just become Any which is not very helpful.)

As the author of this pull request stopped working on it, maybe it should be closed?

@Vizonex
Copy link
Vizonex commented Sep 6, 2023 via email

@Vizonex
Copy link
Vizonex commented Sep 22, 2023

@moi90 I think the only reason why I have been currently stuck on writing something good for my current .pyx to .pyi compiler and doing a pull request is because I'm trying to implement a feature similar to what PyLance is able to do where it has the ability to resolve return type annotations even when the programmer doesn't have anything put in. The tricky thing about that is that Pyrex Types like to hide certain variable types that are things such as built-in types which would commonly return with object which may not be very useful if the programmer has declared that it's going to be something completely different. However I have had been stuck on that idea of resolving these annotations for months with no success. But I'll just start working on the fork for Cython now without the feature of resolving these missing return type annotations until I have figured it out.

@moi90
Copy link
moi90 commented Sep 23, 2023

But I'll just start working on the fork for Cython now without the feature of resolving these missing return type annotations until I have figured it out.

That sounds totally reasonable! :)

(I'm sorry, I'm not of much help. I have very limited time and no prior knowledge on this issue. I just would like to use it ^^)

@Vizonex
Copy link
Vizonex commented Sep 28, 2023

Update: my PyiWriter Class object has been added to my fork of Cython but it still requires a bit of bug fixing and I need to find a place to add it to the pipeline and there needs to be a way to output the file so more research will be needed, so I won't be doing the pull request just yet but it's now being set in motion. I thank all of you for inspiring me to get this feature in and implemented. https://github.com/Vizonex/cython/commit/e4234121210321a00ee64aaec5fc307ece256fee

@leo-hstone
Copy link

Hi! I just started looking into this topic and want to add another item to the list of @RaubCamaioni. The vscode cython extension has an experimental feature to generate stub files: https://github.com/ktnrg45/vs-code-cython

Unfortunatly I do not have the skills to check if it is based one one of the items listed. Maybe it is of help for this feature.

< 7E39 /div>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Python Semantics General Python (3) language semantics Type Analysis
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0