You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# // Keep the booleans together to avoid misalignment during copy-by-value.
399
401
# bool vocab_only; // only load the vocabulary, no weights
400
402
# bool use_mmap; // use mmap if possible
@@ -407,7 +409,7 @@ class llama_model_params(Structure):
407
409
n_gpu_layers (int): number of layers to store in VRAM
408
410
split_mode (int): how to split the model across multiple GPUs
409
411
main_gpu (int): the GPU that is used for the entire model. main_gpu interpretation depends on split_mode: LLAMA_SPLIT_NONE: the GPU that is used for the entire model LLAMA_SPLIT_ROW: the GPU that is used for small tensors and intermediate results LLAMA_SPLIT_LAYER: ignored
410
-
tensor_split (ctypes.Array[ctypes.c_float]): proportion of the model (layers or rows) to offload to each GPU, size: LLAMA_MAX_DEVICES
412
+
tensor_split (ctypes.Array[ctypes.c_float]): proportion of the model (layers or rows) to offload to each GPU, size: LLAMA_MAX_DEVICES
411
413
progress_callback (llama_progress_callback): called with a progress value between 0.0 and 1.0. Pass NULL to disable. If the provided progress_callback returns true, model loading continues. If it returns false, model loading is immediately aborted.
412
414
progress_callback_user_data (ctypes.c_void_p): context pointer passed to the progress callback
413
415
kv_overrides (ctypes.Array[llama_model_kv_override]): override key-value pairs of the model meta data
# /// @details Apply classifier-free guidance to the logits as described in academic paper "Stay on topic with Classifier-Free Guidance" https://arxiv.org/abs/2306.17806
1963
-
# /// @param candidates A vector of `llama_token_data` containing the candidate tokens, the logits must be directly extracted from the original generation context without being sorted.
1964
-
# /// @params guidance_ctx A separate context from the same model. Other than a negative prompt at the beginning, it should have all generated and user input tokens copied from the main context.
1965
-
# /// @params scale Guidance strength. 1.0f means no guidance. Higher values mean stronger guidance.
# /// @param logits Logits extracted from the original generation context.
1966
+
# /// @param logits_guidance Logits extracted from a separate context from the same model. Other than a
10000
negative prompt at the beginning, it should have all generated and user input tokens copied from the main context.
1967
+
# /// @param scale Guidance strength. 1.0f means no guidance. Higher values mean stronger guidance.
1968
+
# LLAMA_API void llama_sample_apply_guidance(
1969
+
# struct llama_context * ctx,
1970
+
# float * logits,
1971
+
# float * logits_guidance,
1972
+
# float scale);
1973
+
defllama_sample_apply_guidance(
1974
+
ctx: llama_context_p,
1975
+
logits, # type: _Pointer[c_float]
1976
+
logits_guidance, # type: _Pointer[c_float]
1977
+
scale: Union[c_float, float],
1978
+
):
1979
+
"""Apply classifier-free guidance to the logits as described in academic paper "Stay on topic with Classifier-Free Guidance" https://arxiv.org/abs/2306.17806"""
0 commit comments