You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+11-5Lines changed: 11 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -7,30 +7,36 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
8
8
## [Unreleased]
9
9
10
-
##[Added]
10
+
### Added
11
11
12
12
- (server) Streaming requests can are now interrupted pre-maturely when a concurrent request is made. Can be controlled with the `interrupt_requests` setting.
13
+
- (server) Moved to fastapi v0.100.0 and pydantic v2
14
+
- (docker) Added a new "simple" image that builds llama.cpp from source when started.
15
+
16
+
## Fixed
17
+
18
+
- (server) performance improvements by avoiding unnecessary memory allocations during sampling
13
19
14
20
## [0.1.68]
15
21
16
-
##[Added]
22
+
### Added
17
23
18
24
- (llama.cpp) Update llama.cpp
19
25
20
26
## [0.1.67]
21
27
22
-
## Fixed
28
+
###Fixed
23
29
24
30
- Fix performance bug in Llama model by pre-allocating memory tokens and logits.
25
31
- Fix bug in Llama model where the model was not free'd after use.
26
32
27
33
## [0.1.66]
28
34
29
-
## Added
35
+
###Added
30
36
31
37
- (llama.cpp) New model API
32
38
33
-
## Fixed
39
+
###Fixed
34
40
35
41
- Performance issue during eval caused by looped np.concatenate call
0 commit comments