From b569dbb13a5324a3ca76d8cdf802de5ab253b63f Mon Sep 17 00:00:00 2001 From: Juraj Bednar Date: Wed, 22 Nov 2023 07:06:47 +0100 Subject: [PATCH] Better documentation on server's prompt formatting. --- README.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/README.md b/README.md index 5596866c1..743947882 100644 --- a/README.md +++ b/README.md @@ -177,6 +177,15 @@ Navigate to [http://localhost:8000/docs](http://localhost:8000/docs) to see the To bind to `0.0.0.0` to enable remote connections, use `python3 -m llama_cpp.server --host 0.0.0.0`. Similarly, to change the port (default is 8000), use `--port`. +You probably also want to set the prompt format. For chatml, use + +```bash +python3 -m llama_cpp.server --model models/7B/llama-model.gguf --chat_format chatml +``` + +That will format the prompt according to how model expects it. You can find the prompt format in the model card. +For possible options, see [llama_cpp/llama_chat_format.py](llama_cpp/llama_chat_format.py) and look for lines starting with "@register_chat_format". + ## Docker image A Docker image is available on [GHCR](https://ghcr.io/abetlen/llama-cpp-python). To run the server: