8000 [CUDA] Use appropriate return code for out of registers kernel launch by GeorgeWeb · Pull Request #1318 · oneapi-src/unified-runtime · GitHub
[go: up one dir, main page]

Skip to content

[CUDA] Use appropriate return code for out of registers kernel launch #1318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

GeorgeWeb
Copy link
Contributor
@GeorgeWeb GeorgeWeb commented Feb 5, 2024

This PR changes the returned error code for exiting the kernel launch entry point in CUDA when exceeding the maximum available registers for execution on the SM. Previously we were returning a misleading error code.

This change requires DPC++ to adapt the specific-error handling, see PR: intel/llvm#12604.

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (8c604b0) 15.39% compared to head (b85dbe2) 15.39%.
Report is 20 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1318   +/-   ##
=======================================
  Coverage   15.39%   15.39%           
=======================================
  Files         240      240           
  Lines       34099    34122   +23     
  Branches     3775     3779    +4     
=======================================
+ Hits         5250     5254    +4     
- Misses      28798    28818   +20     
+ Partials       51       50    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@GeorgeWeb GeorgeWeb force-pushed the georgi/adapter-cuda-out-of-resources-registers-errc branch from b85dbe2 to 301767f Compare February 5, 2024 15:45
@GeorgeWeb GeorgeWeb marked this pull request as ready for review February 6, 2024 12:42
@GeorgeWeb GeorgeWeb requested a review from a team as a code owner February 6, 2024 12:42
@GeorgeWeb GeorgeWeb requested a review from ldrumm February 6, 2024 12:42
Copy link
Contributor
@ldrumm ldrumm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the returned error code for exiting the kernel launch entry point in CUDA when exceeding the maximum available registers for execution on the SM. Previously we were returning a misleading error code

Put that in your commit message, please

@GeorgeWeb GeorgeWeb force-pushed the georgi/adapter-cuda-out-of-resources-registers-errc branch from 301767f to a12bc66 Compare February 6, 2024 13:16
@GeorgeWeb
Copy link
Contributor Author

Change the returned error code for exiting the kernel launch entry point in CUDA when exceeding the maximum available registers for execution on the SM. Previously we were returning a misleading error code

Put that in your commit message, please

Good shout. Reworded now.

@GeorgeWeb GeorgeWeb force-pushed the georgi/adapter-cuda-out-of-resources-registers-errc branch from a12bc66 to eec7aa4 Compare February 6, 2024 13:20
Change the returned error code for exiting the kernel launch entry point in CUDA
when exceeding the maximum available registers for execution on the SM.
Previously we were returning a misleading error code.
@GeorgeWeb GeorgeWeb force-pushed the georgi/adapter-cuda-out-of-resources-registers-errc branch from eec7aa4 to 78a71f3 Compare March 28, 2024 14:27
@kbenzie kbenzie added the cuda CUDA adapter specific issues label Apr 10, 2024
@GeorgeWeb GeorgeWeb added the ready to merge Added to PR's which are ready to merge label May 20, 2024
@kbenzie kbenzie merged commit a97eed1 into oneapi-src:main May 31, 2024
sommerlukas pushed a commit to intel/llvm that referenced this pull request Jun 3, 2024
#12604)

This PR improves the handling of errors by specializing
`PI_ERROR_OUT_OF_RESOURCES`.

Previously, in the CUDA backend we handled the out of resources launch
error (for exceeded registers) as invalid work group size error. Now
pairing the new specialized handling with the UR adapter change
oneapi-src/unified-runtime#1318 to return the
correct error code, we no longer output a misleading error message to
users.
Also, added a fallback message for the generic out of resources error
codes returned from APIs (e.g. for kernel launch).

Fixes issue: oneapi-src/unified-runtime#1308
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA adapter specific issues ready to merge Added to PR's which are ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0