8000 [precompile] Add BundledAOTAutogradCacheEntry by jamesjwu · Pull Request #152840 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[precompile] Add BundledAOTAutogradCacheEntry #152840

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: gh/jamesjwu/146/base
Choose a base branch
from

Conversation

jamesjwu
Copy link
Contributor
@jamesjwu jamesjwu commented May 5, 2025

Stack from ghstack (oldest at bottom):

Finally, this PR adds BundledAOTAutogradCacheEntry. A BundledAOTAutogradCacheEntry is an AOTAutogradCacheEntry that saves the entire CompiledFxGraph directly in the entry.

This has some advantages:

  • No more dependency on FxGraphCache at all
  • Clearing FxGraphCache does not result in AOTAutogradCache miss
  • Simpler logic, as BundledAOTAutogradCacheEntry has everything you need to load a full compiled python wrapper from a dynamo output

We plan to use BundledAOTAutogradCacheEntry for precompile. There's also a question of whether we want to use it for regular caching — the main disadvantage of this is having to save the same CompiledFxGraph twice, once in Inductor cache and once for AOTAutogradCache. With MegaCaching, this could be a regression in total cache size (as well as a minor cold start regression, as you have to save the same graph twice). I will import this and measure the mega cache space complexity, and if it looks good I'll enable it by default for caching as well.

On warm start, if AOTAutogradCache hits, you won't have to load inductor at all, so warm start overhead should be unaffected.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames

Differential Revision: D74593304

[ghstack-poisoned]
Copy link
pytorch-bot bot commented May 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152840

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 7b181d5 with merge base 9d00f2b (image):

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jamesjwu added a commit that referenced this pull request May 5, 2025
ghstack-source-id: 05bcce8
Pull Request resolved: #152840
@jamesjwu jamesjwu changed the title Add BundledAOTAutogradCacheEntry [precompile] Add BundledAOTAutogradCacheEntry May 5, 2025
@jamesjwu jamesjwu added the topic: not user facing topic category label May 5, 2025
[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request May 8, 2025
ghstack-source-id: 0de7afe
Pull Request resolved: #152840
[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request May 8, 2025
ghstack-source-id: 1846951
Pull Request resolved: #152840
[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request May 9, 2025
ghstack-source-id: 237f9b3
Pull Request resolved: #152840
[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request May 9, 2025
ghstack-source-id: a9fe63d
Pull Request resolved: #152840
jamesjwu added a commit that referenced this pull request May 9, 2025
ghstack-source-id: a9fe63d
Pull Request resolved: #152840
@jamesjwu jamesjwu added ciflow/trunk Trigger trunk jobs on your pull request ciflow/mps Run MPS tests (subset of trunk) ciflow/pull and removed ciflow/mps Run MPS tests (subset of trunk) labels May 9, 2025
@jamesjwu jamesjwu requested review from oulgen, zhxchen17 and bdhirsh May 9, 2025 18:08
@jamesjwu jamesjwu marked this pull request as ready for review May 9, 2025 18:08
if config.bundled_autograd_cache:
# Helper function to unwrap all the wrappers we added during aotdispatch
# They get reapplied on cache load
def unwrap_compiled_fx_graph(obj):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So normally, the compiled_fx_graph that AOTAutogradCache passes to the entry is actually wrapped a few times by wrappers like FunctionalizedRngWrapper, etc. For the non bundled case, this didn't matter, because we only care about the cache key from the compiled fx graph.

But now that we are storing the actual compiled fx graph, we need to save the inner CompiledFxGraph object. The object then gets rewrapped properly on post_compile (the same way as an FxGraphCacheLoadable would be).

I also attempted to refactor jit_compile_runtime_wrappers.py to just save to the cache immediately before doing post compile stuff, but there's some variables like num_symints_saved_for_bw and others that get calculated only after certain other wrappers run. Disentangling all of that logic seemed both risky and not really worth it, when it costs basically nothing to just grab the inner wrapped object using this helper function.

@jamesjwu
Copy link
Contributor Author

@jamesjwu has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

81C9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0