vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs #14001

rillomas · 2025-06-04T02:20:08Z

Enabling VK_KHR_cooperative_matrix on Intel Xe2 GPUs (currently Lunar Lake and Battlemage) have significant performance improvement, while we also see performance regressions with older GPUs like Arc A770.
This PR will enable VK_KHR_cooperative_matrix only for Xe2 GPUs until performance regression is resolved for older GPUs.

Reference: #13530

llama-bench results

OS	Platform	Benchmark	b5589-xe2-enabled	b5583-master	Difference
Windows 11 24H2 (gfx driver 32.0.101.6795)	U7-268V	pp512	416.53	148.60	280%
		tg128	37.86	36.87	103%
	i5-13400 + Arc B580	pp512	1631.44	490.89	332%
		tg128	126.69	129.95	97%
	i9-12900K + Arc A770	pp512	977.81	974.92	100%
		tg128	96.83	96.42	100%
Ubuntu 24.04.2 (Mesa 24.2.8)	U7-268V	pp512	167.94	122.42	137%
		tg128	13.67	13.75	99%
	i5-13400 + Arc B580	pp512	583.28	420.57	139%
		tg128	41.80	41.82	100%
	i9-12900K + Arc A770	pp512	328.25	333.88	98%
		tg128	39.00	39.14	100%

Windows

Lunar Lake Core Ultra 7 268V

Before

λ llama-bench.exe -m ..\gemma-2-2b-it-Q4_K_M.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: none
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           pp512 |        148.60 ± 3.52 |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           tg128 |         36.87 ± 0.61 |

build: 7e00e60e (5583)

After

λ llama-bench.exe -m ..\gemma-2-2b-it-Q4_K_M.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           pp512 |       416.53 ± 25.15 |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           tg128 |         37.86 ± 0.19 |

build: a0dd7795 (5589)

Battlemage Arc B580

Before

λ llama-bench.exe -m ..\gemma-2-2b-it-Q4_K_M.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) B580 Graphics (Intel Corporation) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: none
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           pp512 |       490.89 ± 12.28 |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           tg128 |        129.95 ± 0.27 |

build: 7e00e60e (5583)

After

λ llama-bench.exe -m ..\gemma-2-2b-it-Q4_K_M.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) B580 Graphics (Intel Corporation) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           pp512 |       1631.44 ± 4.42 |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           tg128 |        126.69 ± 0.50 |

build: a0dd7795 (5589)

Alchemist Arc A770

Before

λ llama-bench.exe -m C:\Users\cpie-ace\Documents\Axell\gemma-2-2b-it-Q4_K_M.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) A770 Graphics (Intel Corporation) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: none
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           pp512 |        974.92 ± 4.06 |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           tg128 |         96.42 ± 0.40 |

build: 7e00e60e (5583)

After

λ llama-bench.exe -m C:\Users\cpie-ace\Documents\Axell\gemma-2-2b-it-Q4_K_M.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) A770 Graphics (Intel Corporation) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: none
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           pp512 |        977.81 ± 1.79 |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           tg128 |         96.83 ± 0.30 |

build: a0dd7795 (5589)

Linux

Lunar Lake Core Ultra 7 268V

Before

$ ./llama-bench -m ~/Downloads/gemma-2-2b-it-Q4_K_M.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Graphics (LNL) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 131072 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           pp512 |        122.42 ± 0.63 |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           tg128 |         13.75 ± 0.01 |

build: 7e00e60e (5583)

After

$ ./llama-bench -m ~/Downloads/gemma-2-2b-it-Q4_K_M.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Graphics (LNL) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 131072 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           pp512 |        167.94 ± 0.63 |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           tg128 |         13.67 ± 0.05 |

build: a0dd7795 (5589)

Battlemage Arc B580

Before

$ ./llama-bench -m ~/Downloads/gemma-2-2b-it-Q4_K_M.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Graphics (BMG G21) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 163840 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           pp512 |        420.52 ± 0.19 |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           tg128 |         41.82 ± 0.31 |

build: 7e00e60e (5583)

After

$ ./llama-bench -m ~/Downloads/gemma-2-2b-it-Q4_K_M.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Graphics (BMG G21) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 163840 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           pp512 |        583.28 ± 0.69 |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           tg128 |         41.80 ± 0.27 |

build: a0dd7795 (5589)

Alchemist Arc A770

Before

$ ./llama-bench -m ~/Downloads/gemma-2-2b-it-Q4_K_M.gguf 
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(tm) A770 Graphics (DG2) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           pp512 |        333.88 ± 0.34 |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           tg128 |         39.14 ± 0.10 |

build: 7e00e60e (5583)

After

$ ./llama-bench -m ~/Downloads/gemma-2-2b-it-Q4_K_M.gguf 
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(tm) A770 Graphics (DG2) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           pp512 |        328.35 ± 5.95 |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | Vulkan     |  99 |           tg128 |         39.00 ± 0.24 |

build: a0dd7795 (5589)

…opmat

0cc4m

This looks good, did you also check performance on Linux to make sure that we don't cause a regression for Xe2 there?

ggml/src/ggml-vulkan/ggml-vulkan.cpp

rillomas · 2025-06-04T22:51:39Z

Thanks for your review. I'll benchmark the current version on Ubuntu and update the PR soon

Update: I've updated the PR comment with Ubuntu data

0cc4m · 2025-06-05T05:08:26Z

Thank you for testing with Ubuntu. There is no reason not to merge, but it's seriously disappointing how much worse the Linux driver currently is. I hope Intel closes that gap in the near future.

codecnotsupported · 2025-06-05T06:32:24Z

Thank you for testing with Ubuntu. There is no reason not to merge, but it's seriously disappointing how much worse the Linux driver currently is. I hope Intel closes that gap in the near future.

I do believe there is a issue in the mesa bug tracking for that exact problem.
That said, SYCL backend has better performance.

$ ./build/bin/llama-bench -m ./models/gemma-2-2b-it-q4_k_m.gguf 
warning: asserts enabled, performance may be affected
register_backend: registered backend SYCL (1 devices)
register_device: registered device SYCL0 (Intel(R) Arc(TM) B580 Graphics)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (11th Gen Intel(R) Core(TM) i5-11400F @ 2.60GHz)
load_backend: failed to find ggml_backend_init in ./llama.cpp/build/bin/libggml-sycl.so
load_backend: failed to find ggml_backend_init in ./llama.cpp/build/bin/libggml-cpu.so
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | SYCL       |  99 |           pp512 |      2186.80 ± 16.75 |
| gemma2 2B Q4_K - Medium        |   1.59 GiB |     2.61 B | SYCL       |  99 |           tg128 |         54.23 ± 0.31 |

0cc4m · 2025-06-05T07:04:05Z

Yeah, we created an issue about coopmat performance, but not about the general performance issues. SYCL is faster, but less flexible than Vulkan.

0cc4m

LGTM. Thank you for the contribution!

…ggml-org#14001) * allowing B580 and U9-288V * experimenting code to detect Xe2 * allowing coopmat only for Xe2 GPUs * fixed comment wording * fixed comment wording * removed unnecessary driver check

rillomas added 6 commits May 27, 2025 18:04

allowing B580 and U9-288V

981282a

experimenting code to detect Xe2

4a4b0bf

allowing coopmat only for Xe2 GPUs

67aca56

fixed comment wording

b517286

fixed comment wording

8f1b1f9

Merge remote-tracking branch 'origin/master' into allow-list-intel-co…

a0dd779

…opmat

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jun 4, 2025

rillomas marked this pull request as ready for review June 4, 2025 02:24

0cc4m reviewed Jun 4, 2025

View reviewed changes

ggml/src/ggml-vulkan/ggml-vulkan.cpp Outdated Show resolved Hide resolved

removed unnecessary driver check

57c58b3

0cc4m approved these changes Jun 5, 2025

View reviewed changes

0cc4m merged commit 669c13e into ggml-org:master Jun 5, 2025
45 checks passed

rillomas deleted the allow-list-intel-coopmat branch June 5, 2025 21:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs #14001

vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs #14001

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs #14001

vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs #14001

Uh oh!

Conversation

Uh oh!

llama-bench results

Windows

Lunar Lake Core Ultra 7 268V

Before

After

Battlemage Arc B580

Before

After

Alchemist Arc A770

Before

After

Linux

Lunar Lake Core Ultra 7 268V

Before

After

Battlemage Arc B580

Before

After

Alchemist Arc A770

Before

After

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!