-
Notifications
You must be signed in to change notification settings - Fork 125
[CUDA] Use appropriate return code for out of registers kernel launch #1318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA] Use appropriate return code for out of registers kernel launch #1318
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #1318 +/- ##
=======================================
Coverage 15.39% 15.39%
=======================================
Files 240 240
Lines 34099 34122 +23
Branches 3775 3779 +4
=======================================
+ Hits 5250 5254 +4
- Misses 28798 28818 +20
+ Partials 51 50 -1 ☔ View full report in Codecov by Sentry. |
b85dbe2
to
301767f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the returned error code for exiting the kernel launch entry point in CUDA when exceeding the maximum available registers for execution on the SM. Previously we were returning a misleading error code
Put that in your commit message, please
301767f
to
a12bc66
Compare
Good shout. Reworded now. |
a12bc66
to
eec7aa4
Compare
Change the returned error code for exiting the kernel launch entry point in CUDA when exceeding the maximum available registers for execution on the SM. Previously we were returning a misleading error code.
eec7aa4
to
78a71f3
Compare
#12604) This PR improves the handling of errors by specializing `PI_ERROR_OUT_OF_RESOURCES`. Previously, in the CUDA backend we handled the out of resources launch error (for exceeded registers) as invalid work group size error. Now pairing the new specialized handling with the UR adapter change oneapi-src/unified-runtime#1318 to return the correct error code, we no longer output a misleading error message to users. Also, added a fallback message for the generic out of resources error codes returned from APIs (e.g. for kernel launch). Fixes issue: oneapi-src/unified-runtime#1308
This PR changes the returned error code for exiting the kernel launch entry point in CUDA when exceeding the maximum available registers for execution on the SM. Previously we were returning a misleading error code.
This change requires DPC++ to adapt the specific-error handling, see PR: intel/llvm#12604.