You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+19Lines changed: 19 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,25 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
8
8
## [Unreleased]
9
9
10
+
## [0.1.71]
11
+
12
+
### Added
13
+
14
+
- (llama.cpp) Update llama.cpp
15
+
16
+
### Fixed
17
+
18
+
- (server) Fix several pydantic v2 migration bugs
19
+
20
+
## [0.1.70]
21
+
22
+
### Fixed
23
+
24
+
- (Llama.create_completion) Revert change so that `max_tokens` is not truncated to `context_size` in `create_completion`
25
+
- (server) Fixed changed settings field names from pydantic v2 migration
26
+
27
+
## [0.1.69]
28
+
10
29
### Added
11
30
12
31
- (server) Streaming requests can are now interrupted pre-maturely when a concurrent request is made. Can be controlled with the `interrupt_requests` setting.
# /// @details Apply classifier-free guidance to the logits as described in academic paper "Stay on topic with Classifier-Free Guidance" https://arxiv.org/abs/2306.17806
833
+
# /// @param candidates A vector of `llama_token_data` containing the candidate tokens, the logits must be directly extracted from the original generation context without being sorted.
834
+
# /// @params guidance_ctx A separate context from the same model. Other than a negative prompt at the beginning, it should have all generated and user input tokens copied from the main context.
835
+
# /// @params scale Guidance strength. 1.0f means no guidance. Higher values mean stronger guidance.
836
+
# /// @params smooth_factor Smooth factor between guidance logits and original logits. 1.0f means only use guidance logits. 0.0f means only original logits.
description="Whether to interrupt requests when a new request is received.",
@@ -183,7 +179,7 @@ def get_settings():
183
179
yieldsettings
184
180
185
181
186
-
model_field=Field(description="The model to use for generating completions.")
182
+
model_field=Field(description="The model to use for generating completions.", default=None)
187
183
188
184
max_tokens_field=Field(
189
185
default=16, ge=1, le=2048, description="The maximum number of tokens to generate."
@@ -247,21 +243,18 @@ def get_settings():
247
243
default=0,
248
244
ge=0,
249
245
le=2,
250
-
description="Enable Mirostat constant-perplexity algorithm of the specified version (1 or 2; 0 = disabled)"
246
+
description="Enable Mirostat constant-perplexity algorithm of the specified version (1 or 2; 0 = disabled)",
251
247
)
252
248
253
249
mirostat_tau_field=Field(
254
250
default=5.0,
255
251
ge=0.0,
256
252
le=10.0,
257
-
description="Mirostat target entropy, i.e. the target perplexity - lower values produce focused and coherent text, larger values produce more diverse and less coherent text"
253
+
description="Mirostat target entropy, i.e. the target perplexity - lower values produce focused and coherent text, larger values produce more diverse and less coh
B41A
erent text",
0 commit comments