[CUDA] Use appropriate return code for out of registers kernel launch #1318

GeorgeWeb · 2024-02-05T11:23:05Z

This PR changes the returned error code for exiting the kernel launch entry point in CUDA when exceeding the maximum available registers for execution on the SM. Previously we were returning a misleading error code.

This change requires DPC++ to adapt the specific-error handling, see PR: intel/llvm#12604.

codecov-commenter · 2024-02-05T12:17:41Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (8c604b0) 15.39% compared to head (b85dbe2) 15.39%.
Report is 20 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1318   +/-   ##
=======================================
  Coverage   15.39%   15.39%           
=======================================
  Files         240      240           
  Lines       34099    34122   +23     
  Branches     3775     3779    +4     
=======================================
+ Hits         5250     5254    +4     
- Misses      28798    28818   +20     
+ Partials       51       50    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ldrumm

Change the returned error code for exiting the kernel launch entry point in CUDA when exceeding the maximum available registers for execution on the SM. Previously we were returning a misleading error code

Put that in your commit message, please

GeorgeWeb · 2024-02-06T13:17:53Z

Change the returned error code for exiting the kernel launch entry point in CUDA when exceeding the maximum available registers for execution on the SM. Previously we were returning a misleading error code

Put that in your commit message, please

Good shout. Reworded now.

Change the returned error code for exiting the kernel launch entry point in CUDA when exceeding the maximum available registers for execution on the SM. Previously we were returning a misleading error code.

#12604) This PR improves the handling of errors by specializing `PI_ERROR_OUT_OF_RESOURCES`. Previously, in the CUDA backend we handled the out of resources launch error (for exceeded registers) as invalid work group size error. Now pairing the new specialized handling with the UR adapter change oneapi-src/unified-runtime#1318 to return the correct error code, we no longer output a misleading error message to users. Also, added a fallback message for the generic out of resources error codes returned from APIs (e.g. for kernel launch). Fixes issue: oneapi-src/unified-runtime#1308

GeorgeWeb mentioned this pull request Feb 5, 2024

[SYCL][CUDA] Improve kernel launch error handling for out-of-registers intel/llvm#12604

Merged

GeorgeWeb force-pushed the georgi/adapter-cuda-out-of-resources-registers-errc branch from b85dbe2 to 301767f Compare February 5, 2024 15:45

GeorgeWeb marked this pull request as ready for review February 6, 2024 12:42

GeorgeWeb requested a review from a team as a code owner February 6, 2024 12:42

GeorgeWeb requested a review from ldrumm February 6, 2024 12:42

ldrumm requested changes Feb 6, 2024

View reviewed changes

GeorgeWeb force-pushed the georgi/adapter-cuda-out-of-resources-registers-errc branch from 301767f to a12bc66 Compare February 6, 2024 13:16

GeorgeWeb force-pushed the georgi/adapter-cuda-out-of-resources-registers-errc branch from a12bc66 to eec7aa4 Compare February 6, 2024 13:20

ldrumm approved these changes Feb 6, 2024

View reviewed changes

rafbiels mentioned this pull request Feb 8, 2024

[CUDA] Max local mem size check should return OUT_OF_RESOURCES #1322

Open

[CUDA] Use appropriate return code for out of registers kernel launch

78a71f3

Change the returned error code for exiting the kernel launch entry point in CUDA when exceeding the maximum available registers for execution on the SM. Previously we were returning a misleading error code.

GeorgeWeb force-pushed the georgi/adapter-cuda-out-of-resources-registers-errc branch from eec7aa4 to 78a71f3 Compare March 28, 2024 14:27

kbenzie added the cuda CUDA adapter specific issues label Apr 10, 2024

GeorgeWeb added the ready to merge Added to PR's which are ready to merge label May 20, 2024

kbenzie merged commit a97eed1 into oneapi-src:main May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDA] Use appropriate return code for out of registers kernel launch #1318

[CUDA] Use appropriate return code for out of registers kernel launch #1318

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[CUDA] Use appropriate return code for out of registers kernel launch #1318

[CUDA] Use appropriate return code for out of registers kernel launch #1318

Uh oh!

Conversation

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!