Tags · jeffbolznv/llama.cpp

b5558

threading: support for GGML_SCHED_PRIO_LOW, update thread info on Win…

…dows to avoid throttling (ggml-org#12995)

* threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling

We talked about adding LOW priority for GGML threads in the original threadpool PR.
It might be useful for some cases to avoid contention.

Latest Windows ARM64 releases started parking (offlining) the CPU cores
more aggresively which results in suboptimal performance with n_threads > 4.
To deal with that we now disable Power Throttling for our threads for the NORMAL
and higher priorities.

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* threading: disable SetThreadInfo() calls for older Windows versions

* Update tools/llama-bench/llama-bench.cpp

Co-authored-by: Diego Devesa <slarengh@gmail.com>

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>

May 31, 2025
053b153
zip
tar.gz
Downloads

b5548

CUDA: fix typo in FlashAttention code (ggml-org#13926)

May 30, 2025
e562eec
zip
tar.gz
Downloads

b5481

server: fix/test add_generation_prompt (ggml-org#13770)

Co-authored-by: ochafik <ochafik@google.com>

May 25, 2025
d785f9c
zip
tar.gz
Downloads

b5449

hparams : support models for which all layers use SWA (ggml-org#13682)

ggml-ci

May 21, 2025
8e186ef
zip
tar.gz
Downloads

b5415

server : added --no-prefill-assistant flag (ggml-org#13608)

* added no-prefill-assistant flag

* reworded documentation comment

* updated server README.md

May 17, 2025
6a2bc8b
zip
tar.gz
Downloads

b5381

fix: Move build_inp_pos to the top of the graph section for build_gra…

…nite (ggml-org#13538)

This matches how others do it, but will still avoid the extra
initialization when rope is disabled.

Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

May 14, 2025
5e7d95e
zip
tar.gz
Downloads

b5343

docs : Fix typo in InternVL3 model name (ggml-org#13440)

May 10, 2025
62d4250
zip
tar.gz
Downloads

b5287

CUDA: fix logic for clearing padding with -ngl 0 (ggml-org#13320)

May 5, 2025
9070365
zip
tar.gz
Downloads

b5255

ci: fix cross-compile sync issues (ggml-org#12804)

May 1, 2025
d24d592
zip
tar.gz
Downloads

b5223

server : Prefilling assistant message in openai compatible API (ggml-…

…org#13174)

* Prefilling assistant message in openai compatible API

* fixed indentation

* fixed code convention

* simplify method usage

* no more than one assistant message at end of messages

* merge checks into prefill code

* Update examples/server/utils.hpp

---------

Co-authored-by: matteo <matteo@naspc.lan>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

Apr 29, 2025
e2e1ddb
zip
tar.gz
Downloads

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

b5558

b5548

b5481

b5449

b5415

b5381

b5343

b5287

b5255

b5223

Tags: jeffbolznv/llama.cpp