Tags · ggml-org/llama.cpp

gguf-v0.17.0

Version 0.17.0 release

May 29, 2025
2b13162
zip
tar.gz

b5538

cmake: Guard GGML_CPU_ALL_VARIANTS by architecture (#13890)

May 29, 2025
ec9e030
zip
tar.gz
Downloads

b5537

llama : add support for jina-reranker-v2 (#13900)

May 29, 2025
e83ba3e
zip
tar.gz
Downloads

b5535

arm64: optimize q4_k_q8_k kernel with i8mm (#13886)

This PR improves q4_k_q8_k gemm kernel with arm64 i8mm instruction.

Tested on neoverse-n2 with llama3 8b q4_k_m quantization model.
- 34% ~ 50% S_PP uplift for all batch sizes
- 12% ~ 37% S_TG uplift for batch size 4 and above

Perplexity doesn't change with this PR.

```
// tested on neoverse-n2
$ llama-batched-bench \
      -m Meta-Llama-3-8B-Instruct-Q4_K_M.gguf \
      --no-mmap -fa \
      -c 8192 -b 4096 -ub 512 -npp 128 -ntg 128 \
      -npl 1,2,4,8,16,32 \
      -t 64

---------------------------------------------------------------------
|    PP |     TG |    B |       S_PP t/s      |       S_TG t/s      |
|       |        |      | original |  this pr | original |  this pr |
|-------|--------|------|----------|----------|----------|----------|
|   128 |    128 |    1 |   110.12 |   147.83 |    24.36 |    24.28 |
|   128 |    128 |    2 |   121.16 |   172.42 |    46.36 |    47.93 |
|   128 |    128 |    4 |   120.15 |   169.75 |    74.68 |    84.00 |
|   128 |    128 |    8 |   130.97 |   196.81 |    91.04 |   114.74 |
|   128 |    128 |   16 |   131.01 |   196.88 |   101.43 |   135.79 |
|   128 |    128 |   32 |   130.85 |   196.51 |   106.97 |   147.29 |
---------------------------------------------------------------------
```

May 29, 2025
54a2c7a
zip
tar.gz
Downloads

b5534

cmake: Factor out CPU architecture detection (#13883)

* cmake: Define function for querying architecture

The tests and results match exactly those of ggml/src/CMakeLists.txt

* Switch arch detection over to new function

May 29, 2025
21fcc21
zip
tar.gz
Downloads

b5533

ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Al…

…gorithm (#13882)

* F32-Mamba-Seq_Scan-SVE

* Fix formatting

* ggml : missing space

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

May 29, 2025
dd8ba93
zip
tar.gz
Downloads

b5532

tests : remove json.hpp from a test (#13880)

ggml-ci

May 29, 2025
66c9206
zip
tar.gz
Downloads

b5530

llama : add RobertaForSequenceClassification reranker support (#13875)

May 29, 2025
6385b84
zip
tar.gz
Downloads

b5529

ggml: aarch64: Implement SVE F32 kernels for vector functions (#13843)

* F32-Mamba-SVE

* F32-Mamba-SVE

* Resolve test errors-1

* Resolve test errors-2

* F32-vec-SVE

* F32-vec-SVE

* F32-vec-SVE

May 29, 2025
1b8fb81
zip
tar.gz
Downloads

b5527

llama : fix KV shift for qwen2vl (#13870)

* llama : fix KV shift for qwen2vl

* add ref to the PR

May 28, 2025
763d06e
zip
tar.gz
Downloads

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gguf-v0.17.0

b5538

b5537

b5535

b5534

b5533

b5532

b5530

b5529

b5527

Tags: ggml-org/llama.cpp