-
Notifications
You must be signed in to change notification settings - Fork 24.2k
[precompile] Add BundledAOTAutogradCacheEntry #152840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh/jamesjwu/146/base
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152840
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 7b181d5 with merge base 9d00f2b ( UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
if config.bundled_autograd_cache: | ||
# Helper function to unwrap all the wrappers we added during aotdispatch | ||
# They get reapplied on cache load | ||
def unwrap_compiled_fx_graph(obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So normally, the compiled_fx_graph that AOTAutogradCache passes to the entry is actually wrapped a few times by wrappers like FunctionalizedRngWrapper, etc. For the non bundled case, this didn't matter, because we only care about the cache key from the compiled fx graph.
But now that we are storing the actual compiled fx graph, we need to save the inner CompiledFxGraph object. The object then gets rewrapped properly on post_compile (the same way as an FxGraphCacheLoadable would be).
I also attempted to refactor jit_compile_runtime_wrappers.py to just save to the cache immediately before doing post compile stuff, but there's some variables like num_symints_saved_for_bw and others that get calculated only after certain other wrappers run. Disentangling all of that logic seemed both risky and not really worth it, when it costs basically nothing to just grab the inner wrapped object using this helper function.
@jamesjwu has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Stack from ghstack (oldest at bottom):
Finally, this PR adds BundledAOTAutogradCacheEntry. A BundledAOTAutogradCacheEntry is an AOTAutogradCacheEntry that saves the entire CompiledFxGraph directly in the entry.
This has some advantages:
We plan to use BundledAOTAutogradCacheEntry for precompile. There's also a question of whether we want to use it for regular caching — the main disadvantage of this is having to save the same CompiledFxGraph twice, once in Inductor cache and once for AOTAutogradCache. With MegaCaching, this could be a regression in total cache size (as well as a minor cold start regression, as you have to save the same graph twice). I will import this and measure the mega cache space complexity, and if it looks good I'll enable it by default for caching as well.
On warm start, if AOTAutogradCache hits, you won't have to load inductor at all, so warm start overhead should be unaffected.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames
Differential Revision: D74593304