-
Notifications
You must be signed in to change notification settings - Fork 12.1k
vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs #14001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, did you also check performance on Linux to make sure that we don't cause a regression for Xe2 there?
Thanks for your review. I'll benchmark the current version on Ubuntu and update the PR soon Update: I've updated the PR comment with Ubuntu data |
Thank you for testing with Ubuntu. There is no reason not to merge, but it's seriously disappointing how much worse the Linux driver currently is. I hope Intel closes that gap in the near future. |
I do believe there is a issue in the mesa bug tracking for that exact problem.
|
Yeah, we created an issue about coopmat performance, but not about the general performance issues. SYCL is faster, but less flexible than Vulkan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for the contribution!
…ggml-org#14001) * allowing B580 and U9-288V * experimenting code to detect Xe2 * allowing coopmat only for Xe2 GPUs * fixed comment wording * fixed comment wording * removed unnecessary driver check
Enabling VK_KHR_cooperative_matrix on Intel Xe2 GPUs (currently Lunar Lake and Battlemage) have significant performance improvement, while we also see performance regressions with older GPUs like Arc A770.
This PR will enable VK_KHR_cooperative_matrix only for Xe2 GPUs until performance regression is resolved for older GPUs.
Reference: #13530
llama-bench results
Windows
Lunar Lake Core Ultra 7 268V
Before
After
Battlemage Arc B580
Before
After
Alchemist Arc A770
Before
After
Linux
Lunar Lake Core Ultra 7 268V
Before
After
Battlemage Arc B580
Before
After
Alchemist Arc A770
Before
After