8000 metal : add FA-vec kernel for head size 64 by ggerganov · Pull Request #13583 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content

metal : add FA-vec kernel for head size 64 #13583

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 16, 2025
Merged

Conversation

ggerganov
Copy link
Member

Useful for some models such as Llama 3.2 and Whisper

./scripts/compare-commits.sh master gg/metal-fa-vec-64 -m ./models/llama-3.2-1b-instruct/ggml-model-q8_0.gguf -fa 1 -p 0 -d 0,512,1024,8192 -n 32
Model Test t/s master t/s gg/metal-fa-vec-64 Speedup
llama 1B Q8_0 tg32 229.50 269.18 1.17
llama 1B Q8_0 tg32@d512 223.47 260.33 1.16
llama 1B Q8_0 tg32@d1024 210.76 253.69 1.20
llama 1B Q8_0 tg32@d8192 114.52 203.24 1.77

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels May 16, 2025
@ggerganov ggerganov merged commit 654a677 into master May 16, 2025
51 checks passed
@ggerganov ggerganov deleted the gg/metal-fa-vec-64 branch May 16, 2025 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0