Tags · rpatil524/llama.cpp

b5583

vulkan: fix warnings in perf logger querypool code (ggml-org#13937)

Jun 3, 2025
7e00e60
zip
tar.gz
Downloads

b5581

opencl: add `backend_synchronize` (ggml-org#13939)

* This is not needed by the normal use where the result is read
  using `tensor_get`, but it allows perf mode of `test-backend-ops`
  to properly measure performance.

Jun 2, 2025
71e74a3
zip
tar.gz
Downloads

b5579

server : disable speculative decoding for SWA models (ggml-org#13970)

* server : use swa-full fo draft context

ggml-ci

* server : disable speculative decoding for SWA models

Jun 2, 2025
3637576
zip
tar.gz
Downloads

b5575

mtmd : fix memory leak in mtmd_helper_eval_chunk_single (ggml-org#13961)

* mtmd : fix memory in mtmd_helper_eval_chunk_single

* mtmd-cli : fix mem leak

* Update tools/mtmd/mtmd-cli.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Jun 2, 2025
bfd3227
zip
tar.gz
Downloads

b5572

gguf: fix failure on version == 0 (ggml-org#13956)

Jun 1, 2025
7675c55
zip
tar.gz
Downloads

b5569

ggml: check if non-native endian model is being loaded (ggml-org#13943)

* gguf: prevent non-native endian models from being loaded

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* gguf: update error message

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* gguf: make the non-native endian check more verbose

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: move ggml_assert location

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: reword the endianness check error message

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Jun 1, 2025
e57bb87
zip
tar.gz
Downloads

b5561

readme : update bindings (ggml-org#13950)

Jun 1, 2025
8726392
zip
tar.gz
Downloads

b5558

threading: support for GGML_SCHED_PRIO_LOW, update thread info on Win…

…dows to avoid throttling (ggml-org#12995)

* threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling

We talked about adding LOW priority for GGML threads in the original threadpool PR.
It might be useful for some cases to avoid contention.

Latest Windows ARM64 releases started parking (offlining) the CPU cores
more aggresively which results in suboptimal performance with n_threads > 4.
To deal with that we now disable Power Throttling for our threads for the NORMAL
and higher priorities.

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* threading: disable SetThreadInfo() calls for older Windows versions

* Update tools/llama-bench/llama-bench.cpp

Co-authored-by: Diego Devesa <slarengh@gmail.com>

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>

May 31, 2025
053b153
zip
tar.gz
Downloads

b5557

docs : Note about necessity of having libcurl installed for standard …

…build. (ggml-org#13945)

Signed-off-by: Jiri Podivin <jpodivin@gmail.com>

May 31, 2025
b3a89c3
zip
tar.gz
Downloads

b5555

llama : deprecate explicit kv_self defrag/update calls (ggml-org#13921)

ggml-ci

May 31, 2025
803f8ba
zip
tar.gz
Downloads

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

b5583

b5581

b5579

b5575

b5572

b5569

b5561

b5558

b5557

b5555

Tags: rpatil524/llama.cpp