8000 Tags · rpatil524/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content

Tags: rpatil524/llama.cpp

Tags

b5583

Toggle b5583's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature 10000 .
vulkan: fix warnings in perf logger querypool code (ggml-org#13937)

b5581

Toggle b5581's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
opencl: add `backend_synchronize` (ggml-org#13939)

* This is not needed by the normal use where the result is read
  using `tensor_get`, but it allows perf mode of `test-backend-ops`
  to properly measure performance.

b5579

Toggle b5579's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : disable speculative decoding for SWA models (ggml-org#13970)

* server : use swa-full fo draft context

ggml-ci

* server : disable speculative decoding for SWA models

b5575

Toggle b5575's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
mtmd : fix memory leak in mtmd_helper_eval_chunk_single (ggml-org#13961)

* mtmd : fix memory in mtmd_helper_eval_chunk_single

* mtmd-cli : fix mem leak

* Update tools/mtmd/mtmd-cli.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

b5572

Toggle b5572's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
gguf: fix failure on version == 0 (ggml-org#13956)

b5569

Toggle b5569's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml: check if non-native endian model is being loaded (ggml-org#13943)

* gguf: prevent non-native endian models from being loaded

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* gguf: update error message

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* gguf: make the non-native endian check more verbose

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: move ggml_assert location

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: reword the endianness check error message

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

b5561

Toggle b5561's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
readme : update bindings (ggml-org#13950)

b5558

Toggle b5558's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
threading: support for GGML_SCHED_PRIO_LOW, update thread info on Win…

…dows to avoid throttling (ggml-org#12995)

* threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling

We talked about adding LOW priority for GGML threads in the original threadpool PR.
It might be useful for some cases to avoid contention.

Latest Windows ARM64 releases started parking (offlining) the CPU cores
more aggresively which results in suboptimal performance with n_threads > 4.
To deal with that we now disable Power Throttling for our threads for the NORMAL
and higher priorities.

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* threading: disable SetThreadInfo() calls for older Windows versions

* Update tools/llama-bench/llama-bench.cpp

Co-authored-by: Diego Devesa <slarengh@gmail.com>

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>

b5557

Toggle b5557's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
docs : Note about necessity of having libcurl installed for standard …

…build. (ggml-org#13945)

Signed-off-by: Jiri Podivin <jpodivin@gmail.com>

b5555

Toggle b5555's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama : deprecate explicit kv_self defrag/update calls (ggml-org#13921)

ggml-ci
0