8000 docs: add server config docs · jmformenti/llama-cpp-python@522aecb · GitHub
[go: up one dir, main page]

Skip to content

Commit 522aecb

Browse files
committed
docs: add server config docs
1 parent 6473796 commit 522aecb

File tree

2 files changed

+102
-2
lines changed

2 files changed

+102
-2
lines changed

docs/server.md

Lines changed: 95 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,12 @@ python3 -m llama_cpp.server --help
3232

3333
NOTE: All server options are also available as environment variables. For example, `--model` can be set by setting the `MODEL` environment variable.
3434

35+
Check out the server config reference below settings for more information on the available options.
36+
CLI arguments and environment variables are available for all of the fields defined in [`ServerSettings`](#llama_cpp.server.settings.ServerSettings) and [`ModelSettings`](#llama_cpp.server.settings.ModelSettings)
37+
38+
Additionally the server supports configuration check out the [configuration section](#configuration-and-multi-model-support) for more information and examples.
39+
40+
3541
## Guides
3642

3743
### Code Completion
@@ -121,4 +127,92 @@ response = client.chat.completions.create(
121127
],
122128
)
123129
print(response)
124-
```
130+
```
131+
132+
## Configuration and Multi-Model Support
133+
134+
The server supports configuration via a JSON config file that can be passed using the `--config_file` parameter or the `CONFIG_FILE` environment variable.
135+
136+
```bash
137+
python3 -m llama_cpp.server --config_file <config_file>
138+
```
139+
140+
Config files support all of the server and model options supported by the cli and environment variables however instead of only a single model the config file can specify multiple models.
141+
142+
The server supports routing requests to multiple models based on the `model` parameter in the request which matches against the `model_alias` in the config file.
143+
144+
At the moment only a single model is loaded into memory at, the server will automatically load and unload models as needed.
145+
146+
```json
147+
{
148+
"host": "0.0.0.0",
149+
"port": 8080,
150+
"models": [
151+
{
152+
"model": "models/OpenHermes-2.5-Mistral-7B-GGUF/openhermes-2.5-mistral-7b.Q4_K_M.gguf",
153+
"model_alias": "gpt-3.5-turbo",
154+
"chat_format": "chatml",
155+
"n_gpu_layers": -1,
156+
"offload_kqv": true,
157+
"n_threads": 12,
158+
"n_batch": 512,
159+
"n_ctx": 2048
160+
},
161+
{
162+
"model": "models/OpenHermes-2.5-Mistral-7B-GGUF/openhermes-2.5-mistral-7b.Q4_K_M.gguf",
163+
"model_alias": "gpt-4",
164+
"chat_format": "chatml",
165+
"n_gpu_layers": -1,
166+
"offload_kqv": true,
167+
"n_threads": 12,
168+
"n_batch": 512,
169+
"n_ctx": 2048
170+
},
171+
{
172+
"model": "models/ggml_llava-v1.5-7b/ggml-model-q4_k.gguf",
173+
"model_alias": "gpt-4-vision-preview",
174+
"chat_format": "llava-1-5",
175+
"clip_model_path": "models/ggml_llava-v1.5-7b/mmproj-model-f16.gguf",
176+
"n_gpu_layers": -1,
177+
"offload_kqv": true,
178+
"n_threads": 12,
179+
"n_batch": 512,
180+
"n_ctx": 2048
181+
},
182+
{
183+
"model": "models/mistral-7b-v0.1-GGUF/ggml-model-Q4_K.gguf",
184+
"model_alias": "text-davinci-003",
185+
"n_gpu_layers": -1,
186+
"offload_kqv": true,
187+
"n_threads": 12,
188+
"n_batch": 512,
189+
"n_ctx": 2048
190+
},
191+
{
192+
"model": "models/replit-code-v1_5-3b-GGUF/replit-code-v1_5-3b.Q4_0.gguf",
193+
"model_alias": "copilot-codex",
194+
"n_gpu_layers": -1,
195+
"offload_kqv": true,
196+
"n_threads": 12,
197+
"n_batch": 1024,
198+
"n_ctx": 9216
199+
}
200+
]
201+
}
202+
```
203+
204+
The config file format is defined by the [`ConfigFileSettings`](#llama_cpp.server.settings.ConfigFileSettings) class.
205+
206+
## Server Options Reference
207+
208+
::: llama_cpp.server.settings.ConfigFileSettings
209+
options:
210+
show_if_no_docstring: true
211+
212+
::: llama_cpp.server.settings.ServerSettings
213+
options:
214+
show_if_no_docstring: true
215+
216+
::: llama_cpp.server.settings.ModelSettings
217+
options:
218+
show_if_no_docstring: true

llama_cpp/server/settings.py

Lines changed: 7 additions & 1 deletion
8000
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@
1313

1414

1515
class ModelSettings(BaseSettings):
16+
"""Model settings used to load a Llama model."""
17+
1618
model: str = Field(
1719
description="The path to the model to use for generating completions."
1820
)
@@ -131,6 +133,8 @@ class ModelSettings(BaseSettings):
131133

132134

133135
class ServerSettings(BaseSettings):
136+
"""Server settings used to configure the FastAPI and Uvicorn server."""
137+
134138
# Uvicorn Settings
135139
host: str = Field(default="localhost", description="Listen address")
136140
port: int = Field(default=8000, description="Listen port")
@@ -156,6 +160,8 @@ class Settings(ServerSettings, ModelSettings):
156160

157161

158162
class ConfigFileSettings(ServerSettings):
163+
"""Configuration file format settings."""
164+
159165
models: List[ModelSettings] = Field(
160-
default=[], description="Model configs, overwrites default config"
166+
default=[], description="Model configs"
161167
)

0 commit comments

Comments
 (0)
0