8000 [inductor] Add Python type annotations to `torch/_inductor` · Issue #146167 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[inductor] Add Python type annotations to torch/_inductor #146167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 of 9 tasks
rec opened this issue Jan 31, 2025 · 6 comments
Open
3 of 9 tasks

[inductor] Add Python type annotations to torch/_inductor #146167

rec opened this issue Jan 31, 2025 · 6 comments
Assignees
Labels
better-engineering Relatively self-contained tasks for better engineering contributors module: inductor module: typing Related to mypy type annotations oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@rec
Copy link
Collaborator
rec commented Jan 31, 2025

🚀 The feature, motivation and pitch

Type annotations make new development and maintenance easier, and sometimes find bugs.

And torch/_inductor is tricky, and under constant modification by disparate developers.

How?

Adding annotations occasionally finds latent bugs, but the real payoff is in faster and more accurate maintenance and new development that using that annotated code.

If we knew which files, classes and functions were going to be used in future development, we could prioritize annotating those.

What we can measure is what gets imported in existing code.

This little script gives the following sorted counts of imports from _inductor over all of torch/:

  • .pattern_matcher: 486
  • .utils: 324
  • .ir: 137
  • .codegen.common: 131
  • .virtualized: 111
  • .codecache: 59
  • .lowering: 54
  • .scheduler: 52
  • ... a lot more

So there is an import for either torch._inductor.pattern_maker, or a symbol contained within it, 486 times within torch/.

Deliverable (per file)

  • Removal of # mypy: allow-untyped-defs and ignore-errors statements
  • Evaluate :# mypy: allow-untyped-decorators (possibly keep, typing decorators correctly is arduous)
  • For already-typed files, quickly check typing on the most imported symbols

That script above also drills down into individual symbols, for example:

  "torch._inductor.pattern_matcher": {
    "CallFunction": 32,
    "KeywordArg": 30,
    "Arg": 29,
    "CallFunctionVarArgs": 27,
    "Ignored": 26,
    "ListOf": 26,

Tracking

Alternatives

Fumbling ahead with an ongoing ignorance of type information. 😁

cc @ezyang @malfet @xuzhao9 @gramster @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @amjames @desertfire @chauhang @aakhundov

@rec rec added better-engineering Relatively self-contained tasks for better engineering contributors module: inductor module: typing Related to mypy type annotations labels Jan 31, 2025
@rec rec self-assigned this Jan 31, 2025
rec added a commit that referenced this issue Feb 3, 2025
@williamwen42 williamwen42 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 6, 2025
@zeshengzong
Copy link
Contributor

@rec Hello, I would like try to help with virtualized.py if not one working on it, is that ok? Thanks!

@rec
Copy link
Collaborator Author
rec commented Feb 7, 2025

@zeshengzong: Sure, that'd be great, thanks for asking!

I updated the list of files with the files you and I are working on, I should have done that before.

CC me on any review.

@rec
Copy link
Collaborator Author
rec commented Feb 8, 2025

@zeshengzong - so as I'm going on, I'm finding that nearly all the real typing issues that aren't just "declare a parameter" are what I call chameleon variables - a variable that has a lot of possible types on input, but quickly becomes one single type before it gets used.

For example, a parameter variable might have typeUnion[None, int, Expr, Sequence[int], Sequence[Expr]] but after a few operations, all other types are converted to Sequence[Expr].

The type checker might be able to deduce this type, except the code uses the same input variable, which is known to accept all these types.

In many cases this can be fixed by using a new variable:

e: Union[None, int, Expr, Sequence[int], Sequence[Expr]]
if isinstance(e, Expr):
  ex = [e]
elif isinstance(e, int):
  ex = [Expr(e)]
# etc

In other cases, the ambiguity is baked into the API, so I use typing.cast( if I can because it doesn't really(*) change the existing behavior (whereas an assert potentially does).

I have still been using assert callable but it strikes me that we should be using cast there too...

On the other hand, there is an extra call to an identity function. It isn't free, and if it's being traced through Dynamo, will give different results.

Asserts are potentially 100% free if you run Python in optimize mode (-O or -OO) and I'm not sure if Dynamo traces them because of that.


Now I don't even know what to do! 😀

I'm going to look around and see what I can find, and think about the issue...

@rec
Copy link
Collaborator Author
rec commented Feb 8, 2025

So I spent some time thinking about it, and here's what I came up with.

First, the CPU cost of an empty function call or an assert is tiny and shouldn't be considered.

Second, even though the asserts are covering cases that never occur today (seemingly in every case I've looked into, we'd immediately fail anyway), if they do ever trigger, it will make debugging easier.

And assertions can be switched off for maximum speed.

So assertions it is!

@rec
Copy link
Collaborator Author
rec commented Mar 4, 2025

@zeshengzong: All right, we have a new commit.

I learned a lot from doing this, please hit me up with questions. 🙂

@rec
Copy link
Collaborator Author
rec commented Mar 20, 2025

I'm almost finished with the ir.py linting.

The main lesson was that doing the whole file at one time was a bad idea. 😬 Making the changes themselves wasn't that hard - the trouble was getting it to not break unit tests, and while you're working that out, there are huge merges.

Worse, I ran into a new issue, where tests fail on CI but pass locally. I started out with something like 80 errors, 25 of which worked locally, the other 55 of which I could reproduce exactly. I cleaned away the 55 errors I could reproduce, but those 25 are left.(*) I'm using the CI and reading code to track this down, but this is disconcerting, first time in almost a year at pytorch.

Unfortunately, splitting up the pull request generates so many errors in the first half it's not worth it, so I'm charging to the finish....

(* - it looks at this moment as if there are no significant errors left, but that's because I'm testing a portion of the whole.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
better-engineering Relatively self-contained tasks for better engineering contributors module: inductor module: typing Related to mypy type annotations oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants
0