8000 vulkan: KHR_coopmat flash attention · ggml-org/llama.cpp@1a109e6 · GitHub
[go: up one dir, main page]

Skip to content

Commit 1a109e6

Browse files
committed
vulkan: KHR_coopmat flash attention
This shader uses coopmat1 to do the Q*K^T multiply. The P*V multiply is more difficult for various reasons so I haven't done it. Performance for this shader is around 2.5x better than for the scalar shader when doing prompt processing. Some of the benefit may be from other optimizations like staging through shared memory, or splitting by rows.
1 parent 62d4250 commit 1a109e6

File tree

3 files changed

+702
-54
lines changed

3 files changed

+702
-54
lines changed

0 commit comments

Comments
 (0)
0