8000 Worse speed and GPU load than pure llama-cpp · abetlen llama-cpp-python · Discussion #1831 · GitHub
[go: up one dir, main page]

Skip to content

Worse speed and GPU load than pure llama-cpp #1831

Answered by Mushoz
Mushoz asked this question in Q&A
Discussion options

< 8000 task-lists disabled sortable>
You must be logged in to vote

Managed to find the answer myself. For some reason the logits_all parameter defaults to true and tanks performance. Setting it to false brings the performance on par with pure llama-cpp. Not sure if that's a sensible default, but at least I managed to solve the problem. GPU load is also back to 100% again.

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@ExtReMLapin
Comment options

@gl2007
Comment options

Answer selected by Mushoz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants
0