8000 Update changelog · KoolSameer/llama-cpp-python@670fe4b · GitHub
[go: up one dir, main page]

Skip to content

Commit 670fe4b

Browse files
committed
Update changelog
1 parent 2472420 commit 670fe4b

File tree

1 file changed

+11
-5
lines changed

1 file changed

+11
-5
lines changed

CHANGELOG.md

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,30 +7,36 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10-
## [Added]
10+
### Added
1111

1212
- (server) Streaming requests can are now interrupted pre-maturely when a concurrent request is made. Can be controlled with the `interrupt_requests` setting.
13+
- (server) Moved to fastapi v0.100.0 and pydantic v2
14+
- (docker) Added a new "simple" image that builds llama.cpp from source when started.
15+
16+
## Fixed
17+
18+
- (server) performance improvements by avoiding unnecessary memory allocations during sampling
1319

1420
## [0.1.68]
1521

16-
## [Added]
22+
### Added
1723

1824
- (llama.cpp) Update llama.cpp
1925

2026
## [0.1.67]
2127

22-
## Fixed
28+
### Fixed
2329

2430
- Fix performance bug in Llama model by pre-allocating memory tokens and logits.
2531
- Fix bug in Llama model where the model was not free'd after use.
2632

2733
## [0.1.66]
2834

29-
## Added
35+
### Added
3036

3137
- (llama.cpp) New model API
3238

33-
## Fixed
39+
### Fixed
3440

3541
- Performance issue during eval caused by looped np.concatenate call
3642
- State pickling issue when saving cache to disk

0 commit comments

Comments
 (0)
0