-
Notifications
You must be signed in to change notification settings - Fork 24.2k
[cudagraphs][HF][torch 2.7] Excessive cudagraph re-recording for HF LLM models #152275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
high priority
oncall: pt2
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone
Comments
@malfet Would be maybe good to run such "no re-recoding" and "no recompiles" HF transformers/LLM tests within PyTorch itself at least for some models? |
anijain2305
added a commit
that referenced
this issue
Apr 28, 2025
related to #152275 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
anijain2305
added a commit
that referenced
this issue
Apr 28, 2025
related to #152275 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
anijain2305
added a commit
that referenced
this issue
Apr 28, 2025
related to #152275 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
pytorchmergebot
pushed a commit
that referenced
this issue
Apr 28, 2025
related to #152275 Pull Request resolved: #152287 Approved by: https://github.com/bdhirsh, https://github.com/eellison
anijain2305
added a commit
that referenced
this issue
Apr 29, 2025
…c_input_idxs" related to #152275 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
anijain2305
added a commit
that referenced
this issue
Apr 29, 2025
related to #152275 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
bdhirsh
added a commit
that referenced
this issue
Apr 29, 2025
…c_input_idxs" related to #152275 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
bdhirsh
added a commit
that referenced
this issue
Apr 29, 2025
related to #152275 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
pytorchmergebot
pushed a commit
that referenced
this issue
Apr 30, 2025
related to #152275 Pull Request resolved: #152287 Approved by: https://github.com/bdhirsh, https://github.com/eellison Co-authored-by: Brian Hirsh <hirsheybar@fb.com>
pytorchbot
pushed a commit
that referenced
this issue
May 4, 2025
related to #152275 Pull Request resolved: #152287 Approved by: https://github.com/bdhirsh, https://github.com/eellison Co-authored-by: Brian Hirsh <hirsheybar@fb.com> (cherry picked from commit 4a63cab)
atalman
pushed a commit
that referenced
this issue
May 6, 2025
[cudagraphs] Fix issue in collecting static_input_idxs (#152287) related to #152275 Pull Request resolved: #152287 Approved by: https://github.com/bdhirsh, https://github.com/eellison (cherry picked from commit 4a63cab) Co-authored-by: Brian Hirsh <hirsheybar@fb.com>
1 task
Closing as it was fixed and validated in 2.7.1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
high priority
oncall: pt2
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
🐛 Describe the bug
transformers
repo has temporarily pinned the torch version to be <2.7 (HF PR to block 2.7)I find that there is cudagraph recording on each invocation. The issue is present on the
main
branch as well. Here is the tlparse, you can look at the perfetto traces where frequent cudagraph recording is observed.Rerecording issue is -
CheckInvariantStatus.StaticInputIdxMismatch
. This could be related to some missing piece from Dynamo and AOTAutograd to Inductor. cc'ing @BoyuanFeng @eellison @mlazos @zou3519To repro, build
transformers
from source and run this scriptError logs
No response
Versions
NA
cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @chauhang @penguinwu
The text was updated successfully, but these errors were encountered: