Tags: ggml-org/llama.cpp
Toggle gguf-v0.17.0's commit message
Toggle b5538's commit message
cmake: Guard GGML_CPU_ALL_VARIANTS by architecture (#13890 )
Toggle b5537's commit message
llama : add support for jina-reranker-v2 (#13900 )
Toggle b5535's commit message
arm64: optimize q4_k_q8_k kernel with i8mm (#13886 )
This PR improves q4_k_q8_k gemm kernel with arm64 i8mm instruction.
Tested on neoverse-n2 with llama3 8b q4_k_m quantization model.
- 34% ~ 50% S_PP uplift for all batch sizes
- 12% ~ 37% S_TG uplift for batch size 4 and above
Perplexity doesn't change with this PR.
```
// tested on neoverse-n2
$ llama-batched-bench \
-m Meta-Llama-3-8B-Instruct-Q4_K_M.gguf \
--no-mmap -fa \
-c 8192 -b 4096 -ub 512 -npp 128 -ntg 128 \
-npl 1,2,4,8,16,32 \
-t 64
---------------------------------------------------------------------
| PP | TG | B | S_PP t/s | S_TG t/s |
| | | | original | this pr | original | this pr |
|-------|--------|------|----------|----------|----------|----------|
| 128 | 128 | 1 | 110.12 | 147.83 | 24.36 | 24.28 |
| 128 | 128 | 2 | 121.16 | 172.42 | 46.36 | 47.93 |
| 128 | 128 | 4 | 120.15 | 169.75 | 74.68 | 84.00 |
| 128 | 128 | 8 | 130.97 | 196.81 | 91.04 | 114.74 |
| 128 | 128 | 16 | 131.01 | 196.88 | 101.43 | 135.79 |
| 128 | 128 | 32 | 130.85 | 196.51 | 106.97 | 147.29 |
---------------------------------------------------------------------
```
Toggle b5534's commit message
cmake: Factor out CPU architecture detection (#13883 )
* cmake: Define function for querying architecture
The tests and results match exactly those of ggml/src/CMakeLists.txt
* Switch arch detection over to new function
Toggle b5533's commit message
ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Al…
…gorithm (#13882 )
* F32-Mamba-Seq_Scan-SVE
* Fix formatting
* ggml : missing space
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Toggle b5532's commit message
tests : remove json.hpp from a test (#13880 )
ggml-ci
Toggle b5530's commit message
llama : add RobertaForSequenceClassification reranker support (#13875 )
Toggle b5529's commit message
ggml: aarch64: Implement SVE F32 kernels for vector functions (#13843 )
* F32-Mamba-SVE
* F32-Mamba-SVE
* Resolve test errors-1
* Resolve test errors-2
* F32-vec-SVE
* F32-vec-SVE
* F32-vec-SVE
Toggle b5527's commit message
llama : fix KV shift for qwen2vl (#13870 )
* llama : fix KV shift for qwen2vl
* add ref to the PR
You can’t perform that action at this time.