Description
Hi,
I am playing around with the native api and it works well when just using basic example
curl --request POST \ --url http://localhost:8080/completion \ --header "Content-Type: application/json" \ --data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'
However if i add system_prompt
parameter to the query, it hangs indefinitely and there is nothing printed on the server side and no load is seen using nvtop
Server command
llama-server --host 0.0.0.0 -m /models/mistral-7b-instruct-v0.1.Q5_K_M.gguf -c 8000 -ngl 100
Server output
llama server listening at http://0.0.0.0:8080 {"timestamp":1698170944,"level":"INFO","function":"main","line":2499,"message":"HTTP server listening","hostname":"0.0.0.0","port":8080} all slots are idle and system prompt is empty, clear the KV cache
Adding -v to llama-server makes no difference in output
This query hangs when system_prompt is used
Query
curl --request POST \ --url http://127.0.0.1:8080/completion \ --header "Content-Type: application/json" \ --data '{ "prompt": "User: What is your name ?\nAssistant:", "system_prompt": { "anti_prompt": "User:", "assistant_name": "Assistant:", "prompt": "You are an angry assistant that swears alot and your name is Bob\n" }, "temperature": 0.8 }'
Any ideas what i am missing here ? What i am trying to achieve is to give some context.
Whats even more strange is that after trying above query, simple quries no longer work either as they just hang in the same way until server restart.