musa: override warp_size of musa device to 32 #12445

yeahdongcn · 2025-03-18T08:20:03Z

Make sure to read the contributing guidelines before submitting a PR

llama.cpp's musa build encounters a runtime error after commit 10f2e81.

This PR resolves the issue by overriding warp_size of musa device to 32.

Testing Done

test-backend-ops on MTT S80
./build/bin/llama-cli -m ~/models/deepseek-r1_7b_q4_0.gguf -ngl 999 on MTT S80

yeahdongcn · 2025-03-18T08:22:58Z

@JohannesGaessler @IMbackK Could you please review this PR? Thanks.

IMbackK · 2025-03-18T09:30:09Z

No this is not a proper fix, i presume ggml_cuda_info().devices[device].warp_size contains something other than 32 on musa and since ggml_cuda_get_physical_warp_size has no case for musa it returns 32.

We need the warp size in host and device code to be aligned generally, Either you have to make ggml_cuda_get_physical_warp_size return whatever is the real warp size for musa or you have to make ggml_cuda_info().devices[device].warp_size contain 32 globally. With the first ofc being better for performance.

yeahdongcn · 2025-03-18T10:18:10Z

No this is not a proper fix, i presume ggml_cuda_info().devices[device].warp_size contains something other than 32 on musa and since ggml_cuda_get_physical_warp_size has no case for musa it returns 32.

We need the warp size in host and device code to be aligned generally, Either you have to make ggml_cuda_get_physical_warp_size return whatever is the real warp size for musa or you have to make ggml_cuda_info().devices[device].warp_size contain 32 globally. With the first ofc being better for performance.

Thanks for reviewing this PR. On our earlier models (MTT S80), prop.warpSize is 128, but it has reverted to 32 in the latest models. This PR serves as a temporary workaround (since llama.cpp Docker images are currently non-f 8000 unctional on musa), and we are actively exploring a more robust solution to ensure compatibility across all generations.

IMbackK · 2025-03-18T10:30:08Z

This pr dosent resolve the underlying issue that ggml_cuda_info().devices[device].warp_size is misaligned with ggml_cuda_get_physical_warp_size. This causes more issues than the one you worked around in this pr, like for instance mmv is also affected.

You need to either override ggml_cuda_info().devices[device].warp_size to 32 globaly on musa or you need to make ggml_cuda_get_physical_warp_size return 128 on s80 and 32 on later models.

There is no other option

yeahdongcn · 2025-03-18T11:04:44Z

This pr dosent resolve the underlying issue that ggml_cuda_info().devices[device].warp_size is misaligned with ggml_cuda_get_physical_warp_size. This causes more issues than the one you worked around in this pr, like for instance mmv is also affected.

You need to either override ggml_cuda_info().devices[device].warp_size to 32 globaly on musa or you need to make ggml_cuda_get_physical_warp_size return 128 on s80 and 32 on later models.

There is no other option

No problem! I'll update the PR accordingly.

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

IMbackK

This variant is ok

yeahdongcn requested a review from JohannesGaessler as a code owner March 18, 2025 08:20

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Mar 18, 2025

musa: override warp_size of musa device to 32

5c43a3f

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

yeahdongcn force-pushed the fix_musa_warp_size branch from 944e1fe to 5c43a3f Compare March 18, 2025 11:17

yeahdongcn changed the title ~~musa: use fixed warp size (32) in mul_mat_vec_q_cuda~~ musa: override warp_size of musa device to 32 Mar 18, 2025

IMbackK approved these changes Mar 18, 2025

View reviewed changes

IMbackK merged commit bb115d2 into ggml-org:master Mar 18, 2025
47 checks passed

yeahdongcn mentioned this pull request May 16, 2025

CUDA: skip fully masked-out KV in FA vec kernel #13584

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

musa: override warp_size of musa device to 32 #12445

musa: override warp_size of musa device to 32 #12445

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

musa: override warp_size of musa device to 32 #12445

musa: override warp_size of musa device to 32 #12445

Uh oh!

Conversation

Uh oh!

Testing Done

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!