8000 Tags · jeffbolznv/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content

Tags: jeffbolznv/llama.cpp

Tags

b5558

Toggle b5558's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
threading: support for GGML_SCHED_PRIO_LOW, update thread info on Win…

…dows to avoid throttling (ggml-org#12995)

* threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling

We talked about adding LOW priority for GGML threads in the original threadpool PR.
It might be useful for some cases to avoid contention.

Latest Windows ARM64 releases started parking (offlining) the CPU cores
more aggresively which results in suboptimal performance with n_threads > 4.
To deal with that we now disable Power Throttling for our threads for the NORMAL
and higher priorities.

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* threading: disable SetThreadInfo() calls for older Windows versions

* Update tools/llama-bench/llama-bench.cpp

Co-authored-by: Diego Devesa <slarengh@gmail.com>

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>

b5548

Toggle b5548's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CUDA: fix typo in FlashAttention code (ggml-org#13926)

b5481

Toggle b5481's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server: fix/test add_generation_prompt (ggml-org#13770)

Co-authored-by: ochafik <ochafik@google.com>

b5449

Toggle b5449's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
hparams : support models for which all layers use SWA (ggml-org#13682)

ggml-ci

b5415

Toggle b5415's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : added --no-prefill-assistant flag (ggml-org#13608)

* added no-prefill-assistant flag

* reworded documentation comment

* updated server README.md

b5381

Toggle b5381's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix: Move build_inp_pos to the top of the graph section for build_gra…

…nite (ggml-org#13538)

This matches how others do it, but will still avoid the extra
initialization when rope is disabled.

Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

b5343

Toggle b5343's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
docs : Fix typo in InternVL3 model name (ggml-org#13440)

b5287

Toggle b5287's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CUDA: fix logic for clearing padding with -ngl 0 (ggml-org#13320)

b5255

Toggle b5255's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ci: fix cross-compile sync issues (ggml-org#12804)

b5223

Toggle b5223's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : Prefilling assistant message in openai compatible API (ggml-…

…org#13174)

* Prefilling assistant message in openai compatible API

* fixed indentation

* fixed code convention

* simplify method usage

* no more than one assistant message at end of messages

* merge checks into prefill code

* Update examples/server/utils.hpp

---------

Co-authored-by: matteo <matteo@naspc.lan>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
0