Commit 1a109e6

committed

vulkan: KHR_coopmat flash attention

This shader uses coopmat1 to do the Q*K^T multiply. The P*V multiply is more difficult for various reasons so I haven't done it. Performance for this shader is around 2.5x better than for the scalar shader when doing prompt processing. Some of the benefit may be from other optimizations like staging through shared memory, or splitting by rows.

1 parent 62d4250 commit 1a109e6Copy full SHA for 1a109e6

3 files changed

+702

-54

lines changed

ggml/src/ggml-vulkan
- ggml-vulkan.cpp
- vulkan-shaders
  - flash_attn_cm1.comp
  - vulkan-shaders-gen.cpp

3 files changed

+702

-54

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 1a109e6

3 files changed

3 files changed

File tree

3 files changed

3 files changed

0 commit comments