8000 musa: override warp_size of musa device to 32 by yeahdongcn · Pull Request #12445 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content

musa: override warp_size of musa device to 32 #12445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 18, 2025

Conversation

yeahdongcn
Copy link
Collaborator
@yeahdongcn yeahdongcn commented Mar 18, 2025

Make sure to read the contributing guidelines before submitting a PR

llama.cpp's musa build encounters a runtime error after commit 10f2e81.

This PR resolves the issue by overriding warp_size of musa device to 32.

Testing Done

  • test-backend-ops on MTT S80
  • ./build/bin/llama-cli -m ~/models/deepseek-r1_7b_q4_0.gguf -ngl 999 on MTT S80

@yeahdongcn
Copy link
Collaborator Author

@JohannesGaessler @IMbackK Could you please review this PR? Thanks.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Mar 18, 2025
@IMbackK
Copy link
Collaborator
IMbackK commented Mar 18, 2025

No this is not a proper fix, i presume ggml_cuda_info().devices[device].warp_size contains something other than 32 on musa and since ggml_cuda_get_physical_warp_size has no case for musa it returns 32.

We need the warp size in host and device code to be aligned generally, Either you have to make ggml_cuda_get_physical_warp_size return whatever is the real warp size for musa or you have to make ggml_cuda_info().devices[device].warp_size contain 32 globally. With the first ofc being better for performance.

@yeahdongcn
Copy link
Collaborator Author
yeahdongcn commented Mar 18, 2025

No this is not a proper fix, i presume ggml_cuda_info().devices[device].warp_size contains something other than 32 on musa and since ggml_cuda_get_physical_warp_size has no case for musa it returns 32.

We need the warp size in host and device code to be aligned generally, Either you have to make ggml_cuda_get_physical_warp_size return whatever is the real warp size for musa or you have to make ggml_cuda_info().devices[device].warp_size contain 32 globally. With the first ofc being better for performance.

Thanks for reviewing this PR. On our earlier models (MTT S80), prop.warpSize is 128, but it has reverted to 32 in the latest models. This PR serves as a temporary workaround (since llama.cpp Docker images are currently non-f 8000 unctional on musa), and we are actively exploring a more robust solution to ensure compatibility across all generations.

@IMbackK
Copy link
Collaborator
IMbackK commented Mar 18, 2025

This pr dosent resolve the underlying issue that ggml_cuda_info().devices[device].warp_size is misaligned with ggml_cuda_get_physical_warp_size. This causes more issues than the one you worked around in this pr, like for instance mmv is also affected.

You need to either override ggml_cuda_info().devices[device].warp_size to 32 globaly on musa or you need to make ggml_cuda_get_physical_warp_size return 128 on s80 and 32 on later models.

There is no other option

@yeahdongcn
Copy link
Collaborator Author

This pr dosent resolve the underlying issue that ggml_cuda_info().devices[device].warp_size is misaligned with ggml_cuda_get_physical_warp_size. This causes more issues than the one you worked around in this pr, like for instance mmv is also affected.

You need to either override ggml_cuda_info().devices[device].warp_size to 32 globaly on musa or you need to make ggml_cuda_get_physical_warp_size return 128 on s80 and 32 on later models.

There is no other option

No problem! I'll update the PR accordingly.

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
@yeahdongcn yeahdongcn changed the title musa: use fixed warp size (32) in mul_mat_vec_q_cuda musa: override warp_size of musa device to 32 Mar 18, 2025
Copy link
Collaborator
@IMbackK IMbackK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variant is ok

@IMbackK IMbackK merged commit bb115d2 into ggml-org:master Mar 18, 2025
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0