-
Notifications
You must be signed in to change notification settings - Fork 24.2k
[cutlass backend] Reduce log level for cutlass runtime error #153457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh/henrylhtsang/84/base
Are you sure you want to change the base?
Conversation
Differential Revision: [D74629230](https://our.internmc.facebook.com/intern/diff/D74629230/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153457
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New FailureAs of commit 2896e50 with merge base f7798d8 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Differential Revision: [D74629230](https://our.internmc.facebook.com/intern/diff/D74629230/) ghstack-source-id: 283608237 Pull Request resolved: #153457
This pull request was exported from Phabricator. Differential Revision: D74629230 |
Want to make sure we always call self.cleanup_run_fn() even if we crash. I think this is the reason why sometimes we get ``` in _dlclose TypeError: 'NoneType' object is not callable ``` Differential Revision: [D74629230](https://our.internmc.facebook.com/intern/diff/D74629230/) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
Pull Request resolved: #153457 ghstack-source-id: 284104202 @exported-using-ghexport Differential Revision: [D74629230](https://our.internmc.facebook.com/intern/diff/D74629230/)
This pull request was exported from Phabricator. Differential Revision: D74629230 |
torch/_inductor/select_algorithm.py
Outdated
|
||
if not isinstance(choice, CUDATemplateCaller): | ||
log.error( | ||
"CUDA compilation error during autotuning: \n%s. \nIgnoring this choice.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this compilation error? or runtime error?
torch/_inductor/autotune_process.py
Outdated
def dummy_function(): | ||
raise RuntimeError(err_msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for readability, change the fn name to be more descriptive!!!
maybe raise_runtime_error()
torch/_inductor/select_algorithm.py
Outdated
if isinstance(choice, CUDATemplateCaller): | ||
log.debug( | ||
"Runtime error during autotuning: \n%s. \nIgnoring this choice.", | ||
msg, | ||
exc_info=True, | ||
) | ||
else: | ||
if "illegal memory access" in msg: | ||
msg += "\n\nEither error in template or triton bug.\n" | ||
log.error( | ||
"Runtime error during autotuning: \n%s. \nIgnoring this choice.", | ||
msg, | ||
) | ||
log.error( | ||
"Runtime error during autotuning: \n%s. \nIgnoring this choice.", | ||
msg, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'd be easier to read if we just log this once, but change the log-level and exc_info depending on if it came from CUDATemplateCaller or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'd be easier to read if we just log this once, but change the log-level and exc_info depending on if it came from CUDATemplateCaller or not.
I thought about it a few times, don't tink there is a easy to to do
Want to make sure we always call self.cleanup_run_fn() even if we crash. I think this is the reason why sometimes we get ``` in _dlclose TypeError: 'NoneType' object is not callable ``` Differential Revision: [D74629230](https://our.internmc.facebook.com/intern/diff/D74629230/) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
Pull Request resolved: #153457 ghstack-source-id: 284547176 @exported-using-ghexport Differential Revision: [D74629230](https://our.internmc.facebook.com/intern/diff/D74629230/)
This pull request was exported from Phabricator. Differential Revision: D74629230 |
Stack from ghstack (oldest at bottom):
Want to make sure we always call self.cleanup_run_fn() even if we crash.
I think this is the reason why sometimes we get
Differential Revision: D74629230
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov