You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
llama-cli works as intended.
But when trying to run llama-server, only the first generation is working fine.
When this first generation ends up, or cancelled, the server crashes upon any new generation attempt.
Uh oh!
There was an error while loading. Please reload this page.
Name and Version
build b5155
Operating systems
Linux
GGML backends
CUDA
Hardware
RTX 3090 24GB
Models
LatentWanderer/featherless-ai_Qwerky-QwQ-32B-gguf
Problem description & steps to reproduce
llama-cli
works as intended.But when trying to run llama-server, only the first generation is working fine.
When this first generation ends up, or cancelled, the server crashes upon any new generation attempt.
What I exactly did:
First Bad Commit
I can't exactly tell right now, but picking a way older version, for example
b4616
, the bug isn't encountered.Relevant log output
Here are 2 consecutive generation requests:
The text was updated successfully, but these errors were encountered: