8000 Tags · ngxson/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content

Tags: ngxson/llama.cpp

Tags

b5943

Toggle b5943's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml: adds CONV_2D op and direct GEMM Vulkan implementation (ggml-org…

…#14316)

* ggml/ggml-vulkan/test-backend-ops: adds CONV_2D for Vulkan

* ggml-vulkan: adds f32 scalar shader to compute 2D convolution directly
with gemm (no need for im2col),

* test-backend-ops: adds test_case_ref to check the validity/performance of ops
against reference implementations having different graphs, adds tests

* * Performance fixes: minimized branch divergence, uses collectives to
  eliminate redundant calculation, macros removed.

* Kernel shared memory size check

* Updates test-backend-ops to support graphs for performance
  measurement.

* * Apple/Win32 compile errors fixed

* Subgroup size used to determine tile size -> fixes llvmpipe errors.

* Collectives disabled by default.

* Intel support is disabled as the performance is poor.

* Conv2d enabled for Intel with disabled collectives, disabled for Apple

* test-backend-ops modifications are reverted

* Trailing spaces and missing override fixed.

* Triggering pipeline relaunch.

* Code formatted with .clang-format.

b5942

Toggle b5942's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
imatrix : use GGUF to store importance matrices (ggml-org#9400)

* imatrix : allow processing multiple chunks per batch

* perplexity : simplify filling the batch

* imatrix : fix segfault when using a single chunk per batch

* imatrix : use GGUF to store imatrix data

* imatrix : fix conversion problems

* imatrix : use FMA and sort tensor names

* py : add requirements for legacy imatrix convert script

* perplexity : revert changes

* py : include imatrix converter requirements in toplevel requirements

* imatrix : avoid using designated initializers in C++

* imatrix : remove unused n_entries

* imatrix : allow loading mis-ordered tensors

Sums and counts tensors no longer need to be consecutive.

* imatrix : more sanity checks when loading multiple imatrix files

* imatrix : use ggml_format_name instead of std::string concatenation

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

* quantize : use unused imatrix chunk_size with LLAMA_TRACE

* common : use GGUF for imatrix output by default

* imatrix : two-way conversion between old format and GGUF

* convert : remove imatrix to gguf python script

* imatrix : use the function name in more error messages

* imatrix : don't use FMA explicitly

This should make comparisons between the formats easier
because this matches the behavior of the previous version.

* imatrix : avoid returning from void function save_imatrix

* imatrix : support 3d tensors with MUL_MAT

* quantize : fix dataset name loading from gguf imatrix

* common : move string_remove_suffix from quantize and imatrix

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* imatrix : add warning when legacy format is written

* imatrix : warn when writing partial data, to help guess dataset coverage

Also make the legacy format store partial data
by using neutral values for missing data.
This matches what is done at read-time for the new format,
and so should get the same quality in case the old format is still used.

* imatrix : avoid loading model to convert or combine imatrix

* imatrix : avoid using imatrix.dat in README

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

b5941

Toggle b5941's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vulkan: Add logging for bf16 features to ggml_vk_print_gpu_info (ggml…

…-org#13274) (ggml-org#14707)

b5940

Toggle b5940's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Vulkan: Fix fprintf format-security warning (ggml-org#14770)

b5937

Toggle b5937's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
metal : fuse add, mul + add tests (ggml-org#14596)

ggml-ci

b5936

Toggle b5936's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
graph : fix graph reuse reset of params (ggml-org#14760)

ggml-ci

b5935

Toggle b5935's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
parallel : add option for different RNG seeds (ggml-org#14757)

ggml-ci

b5934

Toggle b5934's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
cuda : Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs (ggml-org#14741)

* Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs

Gemma3n uses Matrix-Matrix addition as part of their input processing,
wrongly triggering CUDA_GRAPH disablement on NVGPUs even when batch-size
of 1 is used.

* Exclude `project_per_layer_input` by matching node names

This ensures that all other graphs which don't exhibit this pattern do
not have their behavior changed.

* Revert unnecessary formatting changes

b5933

Toggle b5933's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
graph : avoid huge warm-up graphs for MoE models (ggml-org#14753)

* graph : avoid huge warm-up graphs for MoE models

ggml-ci

* cont : bump max nodes to 8x model tensors

b5932

Toggle b5932's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
model : fix build after merge conflict (ggml-org#14754)

0