Tags · Pints-AI/llama.cpp

b1892

finetune : add training data file to log message (ggml-org#4979)

This commit adds the name of the training data file to the log message
printed when the training data is tokenized.

The motivation for this change is that it can be useful to show which
file is being tokenized when running the finetune example.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

Jan 16, 2024
cec8a48
zip
tar.gz

b1891

ggml : importance matrix support for legacy quants (ggml-org#4969)

* imatrix: adding support for legacy quants

* imatrix: guard Q4_0/Q5_0 against ffn_down craziness

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Jan 16, 2024
334a835
zip
tar.gz

b1889

perplexity : fix kv cache handling for hellaswag (ggml-org#4981)

ggml-ci

Jan 16, 2024
959ef0c
zip
tar.gz

b1887

metal : localized logic in `ggml_metal_graph_compute` (ggml-org#4924)

* Metal: Localized logic in `ggml_metal_graph_compute`, minor performance improvement

* Whitespace

* Collecting command buffer completions on single thread

* Whitespace

* Reduce diff noise

Jan 16, 2024
158f8c9
zip
tar.gz

b1886

android : introduce starter project example (ggml-org#4926)

* Introduce starter project for Android

Based on examples/llama.swiftui.

* Add github workflow

* Set NDK version

* Only build arm64-v8a in CI

* Sync bench code

* Rename CI prop to skip-armeabi-v7a

* Remove unused tests

Jan 16, 2024
862f5e4
zip
tar.gz

b1885

metal : replace loop of dispatch_async with dispatch_apply (ggml-org#…

…4934)

* Replace loop of dispatch_async with dispatch_apply

* Update ggml-metal.m

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Jan 16, 2024
3a48d55
zip
tar.gz

b1884

metal : log `recommendedMaxWorkingSetSize` on iOS 16+ (ggml-org#4936)

* metal: Log `recommendedMaxWorkingSetSize` on iOS 16+

* Only log on iOS and macOS, ignoring tvOS and other platforms

* Check for Xcode version before using recommendedMaxWorkingSetSize

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Jan 16, 2024
7c8d3ab
zip
tar.gz

b1882

ggml : introduce GGML_CALL function annotation (ggml-org#4850)

This change makes it possible to build ggml-cuda.cu and ggml-metal.m as
independent dynamic shared objects, that may be conditionally linked at
runtime in a multiplatform binary. It introduces a GGML_CALL annotation
that documents which functions have a cyclic call relationship, between
the application code and GPU modules.

This change does nothing, unless the build defines -DGGML_MULTIPLATFORM
which causes back-references and function pointers to conform to MS ABI
which is supported by NVCC, ROCm, XCode, GCC and Clang across platforms

Jan 16, 2024
a0b3ac8
zip
tar.gz

b1881

finetune : use LLAMA_FILE_MAGIC_GGLA (ggml-org#4961)

This commit replaces the magic number LLAMA_FILE_MAGIC_LORA used in
finetune.cpp with LLAMA_FILE_MAGIC_GGLA defined in llama.h.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

Jan 16, 2024
d75c232
zip
tar.gz

b1880

speculative : threading options (ggml-org#4959)

* speculative: expose draft threading

* fix usage format

* accept -td and -tbd args

* speculative: revert default behavior when -td is unspecified

* fix trailing whitespace

Jan 16, 2024
e032428
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

b1892

b1891

b1889

b1887

b1886

b1885

b1884

b1882

b1881

b1880

Tags: Pints-AI/llama.cpp