8000 [cutlass backend] Reduce log level for cutlass runtime error by henrylhtsang · Pull Request #153457 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[cutlass backend] Reduce log level for cutlass runtime error #153457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: gh/henrylhtsang/84/base
Choose a base branch
from

Conversation

henrylhtsang
Copy link
Contributor
@henrylhtsang henrylhtsang commented May 13, 2025

Copy link
pytorch-bot bot commented May 13, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153457

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure

As of commit 2896e50 with merge base f7798d8 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

henrylhtsang added a commit that referenced this pull request May 13, 2025
Differential Revision: [D74629230](https://our.internmc.facebook.com/intern/diff/D74629230/)

ghstack-source-id: 283608237
Pull Request resolved: #153457
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74629230

Want to make sure we always call self.cleanup_run_fn() even if we crash.

I think this is the reason why sometimes we get 
```
in _dlclose
TypeError: 'NoneType' object is not callable
```

Differential Revision: [D74629230](https://our.internmc.facebook.com/intern/diff/D74629230/)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
henrylhtsang added a commit that referenced this pull request May 15, 2025
Pull Request resolved: #153457


ghstack-source-id: 284104202
@exported-using-ghexport

Differential Revision: [D74629230](https://our.internmc.facebook.com/intern/diff/D74629230/)
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74629230


if not isinstance(choice, CUDATemplateCaller):
log.error(
"CUDA compilation error during autotuning: \n%s. \nIgnoring this choice.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this compilation error? or runtime error?

Comment on lines 736 to 737
def dummy_function():
raise RuntimeError(err_msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for readability, change the fn name to be more descriptive!!!

maybe raise_runtime_error()

Comment on lines 2281 to 2291
if isinstance(choice, CUDATemplateCaller):
log.debug(
"Runtime error during autotuning: \n%s. \nIgnoring this choice.",
msg,
exc_info=True,
)
else:
if "illegal memory access" in msg:
msg += "\n\nEither error in template or triton bug.\n"
log.error(
"Runtime error during autotuning: \n%s. \nIgnoring this choice.",
msg,
)
log.error(
"Runtime error during autotuning: \n%s. \nIgnoring this choice.",
msg,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be easier to read if we just log this once, but change the log-level and exc_info depending on if it came from CUDATemplateCaller or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be easier to read if we just log this once, but change the log-level and exc_info depending on if it came from CUDATemplateCaller or not.

I thought about it a few times, don't tink there is a easy to to do

@henrylhtsang henrylhtsang requested a review from mlazos May 16, 2025 17:26
Want to make sure we always call self.cleanup_run_fn() even if we crash.

I think this is the reason why sometimes we get 
```
in _dlclose
TypeError: 'NoneType' object is not callable
```

Differential Revision: [D74629230](https://our.internmc.facebook.com/intern/diff/D74629230/)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
henrylhtsang added a commit that referenced this pull request May 16, 2025
Pull Request resolved: #153457


ghstack-source-id: 284547176
@exported-using-ghexport

Differential Revision: [D74629230](https://our.internmc.facebook.com/intern/diff/D74629230/)
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74629230

7185
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0