-
Notifications
You must be signed in to change notification settings - Fork 11.9k
musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy #13647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are replacing only the case where the memory of one tensor is copied to another tensor as one contiguous block. I would have intuitively assumed that a memcpy would perform quite well in that scenario, how much faster is the mudnn implementation?
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
In my local tests on the MTT S80, I observed nearly a 70% ( |
I also have a question regarding how |
@ggerganov can you make @yeahdongcn a collaborator so that he can merge approved PRs at his own discretion? |
Yes, invite sent. |
Thanks @JohannesGaessler @ggerganov Just accepted the invitation. |
…CHECK Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
I’ve removed |
…ITY op to accelerate D2D memory copy (ggml-org#13647) * musa: fix build warning (unused parameter) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: upgrade MUSA SDK version to rc4.0.1 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Update ggml/src/ggml-cuda/cpy.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Make sure to read the contributing guidelines before submitting a PR
Testing Done
test-backend-ops -o CPY
passeddocker run -it -v ~/models:/models local/llama.cpp:light-musa -m /models/deepseek-r1_7b_q4_0.gguf -ngl 999
/docker run -p 8080:8080 -it -v ~/models:/models local/llama.cpp:server-musa -m /models/deepseek-r1_7b_q4_0.gguf -ngl 999
/docker run -it -v ~/models:/models local/llama.cpp:full-musa --run -m /models/deepseek-r1_7b_q4_0.gguf -ngl 999
Logs: