8000 [ca] no longer require is_traceable annotations for c++ autograd functions by xmfan · Pull Request #146229 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[ca] no longer require is_traceable annotations for c++ autograd functions #146229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

xmfan
Copy link
Member
@xmfan xmfan commented Feb 1, 2025

Stack from ghstack (oldest at bottom):

This PR removes the CA compile-time error for C++ autograd functions, and supports them by having dynamo graph break on them (instead of allow_in_graph). The CppNode's collects are kept as is for now.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]
Copy link
pytorch-bot bot commented Feb 1, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146229

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ You can merge normally! (2 Unrelated Failures)

As of commit b313e9f with merge base 87a63a9 (image):

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results February 1, 2025 02:20 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results February 1, 2025 02:20 Inactive
@xmfan xmfan requested review from jansel and zou3519 February 3, 2025 18:23
@albanD albanD removed their request for review February 3, 2025 20:14
auto fn_name = pyinterface->bind_function(
saved.get_py_compiler(),
std::string(typeid(T).name()),
CppNode_apply_functional_ivalue<T>,
schema,
/*is_custom_function*/ true);
/*is_custom_function*/ true,
/*is_traceable*/ T::is_traceable);
8000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the default for is_traceable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

false, and always will have to be false

Comment on lines 88 to 89
if is_traceable:
torch._dynamo.allow_in_graph(result)
Copy link
Contributor
@zou3519 zou3519 Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, what causes Dynamo to graph break on it? the SkipFiles error?
(It might be better for readability and for the error messages+logging if we explicitly torch._dynamo.skip the non-traceable things)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something about pybind. changed it to a disable, and had to wrap it in a fn to make the decorator work

Copy link
Contributor
@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

[ghstack-poisoned]
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results February 4, 2025 22:18 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results February 4, 2025 22:18 Inactive
[ghstack-poisoned]
@xmfan
Copy link
Member Author
xmfan commented Feb 5, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 5, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Feb 5, 2025
This PR accumulates comple reasons inside each CacheNode, and logs them to tlparse on each CA compile. This defines a compile as an autograd structure change, and a recompile as a dynamic shape change.

sample tlparse: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpdbo7gt/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100

for compiles:
```python
[
  "!0: Cache miss due to new autograd node: torch::autograd::GraphRoot (NodeCall 0) with key size 39, previous key sizes=[]"
]
```

for recompiles:
```python
[
  "!0: Cache miss due to new autograd node: torch::autograd::GraphRoot (NodeCall 0) with key size 39, previous key sizes=[]",
  "!1: Cache miss due to 7 changed tensor shapes (total of 7): sizes[0], sizes[1], sizes[2], sizes[3], sizes[4], sizes[5], sizes[6]"
]
```

Pull Request resolved: #146386
Approved by: https://github.com/jansel
ghstack dependencies: #146229
@github-actions github-actions bot deleted the gh/xmfan/168/head branch March 8, 2025 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0