-
Notifications
You must be signed in to change notification settings - Fork 11.9k
OpenCL: Performance comparison depending on gpu_offloads #12810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@lhez , @max-krasnyansky |
try Q4_0 ? I have tried Q4_K_M in the past and the performance was not good either. |
Thanks. I'm gonna test w/ Q4_0 either. |
@sparkleholic - currently Q4_0 is optimized, so you will need to use Adreno 740 should work just fine. Feel free to reply back if you see any issue with 740. |
@lhez, @kizuna0487
|
This issue was closed because it has been inactive for 14 days since being marked as stale. |
I expected more gpu_offloads get better performances(tokens/sec), however the bench-results were different.
The followings were executed on QCS8550 with a model (https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct-GGUF/blob/main/EXAONE-3.5-2.4B-Instruct-Q4_K_M.gguf).
The text was updated successfully, but these errors were encountered: