8000
We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 66fb034 commit d018c7bCopy full SHA for d018c7b
llama_cpp/llama.py
@@ -239,6 +239,7 @@ def __init__(
239
n_ctx: Maximum context size.
240
n_parts: Number of parts to split the model into. If -1, the number of parts is automatically determined.
241
seed: Random seed. -1 for random.
242
+ n_gpu_layers: Number of layers to offload to GPU (-ngl). If -1, all layers are offloaded.
243
f16_kv: Use half-precision for key/value cache.
244
logits_all: Return logits for all tokens, not just the last token.
245
vocab_only: Only load the vocabulary no weights.
0 commit comments