-
Notifications
You must be signed in to change notification settings - Fork 12.1k
musa: override warp_size of musa device to 32 #12445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@JohannesGaessler @IMbackK Could you please review this PR? Thanks. |
No this is not a proper fix, i presume ggml_cuda_info().devices[device].warp_size contains something other than 32 on musa and since ggml_cuda_get_physical_warp_size has no case for musa it returns 32. We need the warp size in host and device code to be aligned generally, Either you have to make ggml_cuda_get_physical_warp_size return whatever is the real warp size for musa or you have to make ggml_cuda_info().devices[device].warp_size contain 32 globally. With the first ofc being better for performance. |
Thanks for reviewing this PR. On our earlier models (MTT S80), |
This pr dosent resolve the underlying issue that ggml_cuda_info().devices[device].warp_size is misaligned with ggml_cuda_get_physical_warp_size. This causes more issues than the one you worked around in this pr, like for instance mmv is also affected. You need to either override ggml_cuda_info().devices[device].warp_size to 32 globaly on musa or you need to make ggml_cuda_get_physical_warp_size return 128 on s80 and 32 on later models. There is no other option |
No problem! I'll update the PR accordingly. |
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
944e1fe
to
5c43a3f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variant is ok
Make sure to read the contributing guidelines before submitting a PR
llama.cpp's
musa
build encounters a runtime error after commit 10f2e81.This PR resolves the issue by overriding
warp_size
of musa device to32
.Testing Done
test-backend-ops
on MTT S80./build/bin/llama-cli -m ~/models/deepseek-r1_7b_q4_0.gguf -ngl 999
on MTT S80