8000 SYCL: Avoid using SYCL-Graph for unsupported nodes by EwanC · Pull Request #13587 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content

SYCL: Avoid using SYCL-Graph for unsupported nodes #13587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 22, 2025

Conversation

EwanC
Copy link
Contributor
@EwanC EwanC commented May 16, 2025

Currently on a CUDA backend to SYCL when running
GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0 there are two operations that throw an exception from the blocking waits during queue recording.

We've noticed that ggml-cuda.cu has the
check_node_graph_compatibility_and_refresh_copy_ops method for checking if a graph can be used, even if enabled. I've taken a similar approach in this PR by adding a method to ggml-sycl.cpp for checking if a graph can be used for the operations even if a user has asked for it to be enabled.

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels May 16, 2025
@EwanC EwanC changed the title SYCL: Avoid using with SYCL-Graph for unsupported nodes SYCL: Avoid using SYCL-Graph for unsupported nodes May 16, 2025
@EwanC EwanC marked this pull request as ready for review May 16, 2025 12:06
Currently on a CUDA backend to SYCL when running
`GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0` there
are two operations that throw an exception from the blocking
waits during queue recording.

* `-o CONCAT` : Use of blocking waits on a queue that's being recorded https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/concat.cpp#L185-L187
* `-o MUL_MAT_ID`: Blocking wait on a recording queue for a copy to host memory https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/ggml-sycl.cpp#L3072-L3074

We've noticed that `ggml-cuda.cu` has the
[check_node_graph_compatibility_and_refresh_copy_ops](https://github.com/ggml-org/llama.cpp/blob/39e73ae0d69f882d7e29cecc6dd8f5052fca6731/ggml/src/ggml-cuda/ggml-cuda.cu#L2458-L2458)
method for checking if a graph can be used, even if enabled. I've taken a
similar approach in this PR by adding a method to `ggml-sycl.cpp` for checking
if a graph can be used for the operations even if a user has asked for it to be
enabled.
@EwanC EwanC force-pushed the graph_skip_unsupported_nodes branch from 289597c to baf7b65 Compare May 21, 2025 13:46
@NeoZhangJianyu
Copy link
Collaborator

So, all LLMs including CONCAT or MUL_MAT_ID can't use sycl graph.
Is it right?

@EwanC
Copy link
Contributor Author
EwanC commented May 22, 2025

So, all LLMs including CONCAT or MUL_MAT_ID can't use sycl graph.

Yup that's correct, the way these operations are implemented isn't valid usage for creating a sycl-graph by queue recording, leading to exceptions being thrown. I think the CONCAT/MUL_MAT_ID implementations could be reworked to make them valid for recording in a sycl-graph, but that's a larger/future task. So instead this PR avoids the test-backend-ops aborts i'm seeing by disabling SYCL-Graph usage when those operations are detected in the LLM workload.

@NeoZhangJianyu
Copy link
Collaborator

OK! It's clear to me!

Maybe draft a special version of concat and mul_mat_id for sycl graph.

@NeoZhangJianyu NeoZhangJianyu merged commit 6b56a64 into ggml-org:master May 22, 2025
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0