Tags · ochafik/llama.cpp

b5537

llama : add support for jina-reranker-v2 (ggml-org#13900)

May 29, 2025
e83ba3e
zip
tar.gz
Downloads

b5500

scripts : add option to compare commits in Debug (ggml-org#13806)

* scripts : add option to compare commits in Debug

* cont : reuse existing CMAKE_OPTS

May 26, 2025
a26c4cc
zip
tar.gz
Downloads

b5497

server: fix streaming crashes (ggml-org#13786)

* add preludes to content on partial regex match

* allow all parsers to parse non-tool-call content.

* tweak order of <|python_tag|> vs <function= parsing for functionary v3.1 format. still not ideal but hopefully less prone to crash

May 26, 2025
03f582a
zip
tar.gz
Downloads

b5495

`server`: fix format of streamed tool call deltas (diff name, fix id …

…location) (ggml-org#13800)

* fix deltas of tool_call.function.name

* fix tool_call.id (was in tool_call.function.id!) + add function type

* add tool_call.type

* populate empty tool_call.function.arguments on first delta

May 26, 2025
d74e94c
zip
tar.gz
Downloads

b5494

server: fix regression on streamed non-chat completion w/ stops (ggml…

…-org#13785)

* more forgiving message diffs: partial stop words aren't erased, full stops are

* Add (slow) server test for completion + stream + stop

May 26, 2025
f13847c
zip
tar.gz
Downloads

b5493

examples : allow extracting embeddings from decoder contexts (ggml-or…

…g#13797)

ggml-ci

May 26, 2025
79c137f
zip
tar.gz
Downloads

b5488

`server`: add `--reasoning-budget 0` to disable thinking (incl. qwen3…

… w/ enable_thinking:false) (ggml-org#13771)

---------

Co-authored-by: ochafik <ochafik@google.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

May 25, 2025
e121edc
zip
tar.gz
Downloads

b5479

server: fix/test add_generation_prompt

May 25, 2025
7cea29b
zip
tar.gz
Downloads

b5478

`server`: streaming of tool calls and thoughts when `--jinja` is on (g…

…gml-org#12379)

* add common_json w/ support for truncated json healing

* add common_chat_msg_diff

* partial common_chat_parse

* refactor parser w/ optionals

* server: wire chat diffs in stream mode

* fix trigger of thinking models (must happen after thoughts are closed)

* fix functionary v3.2 raw python!

* rename: common_chat_syntax (now contains format)

* rm common_regex.at_start

* don't return empty <think></think>

* accommodate yet another deepseek r1 distill fantasy syntax (`<｜tool▁calls｜>`)

* fix QwQ 32B tool call parsing after thoughts (hermes2)

* better logs for grammar triggers

* consume spaces after parse_json_tool_calls

* fix required tool calls w/ thinking models that have pre-opened thinking tags

* fix thinking model's initial trigger + test qwq's template

* run most test_tool_call tests in stream + non-stream modes

* make functionary v3.2 parsing more strict (differentiate first match from others)

* send final diff from server, to close off raw python arguments

* support partial content streaming in Generic mode

* tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5)

* Update function-calling.md

* Update tool_bench.py

* chat-parser: remove input from exception (llm output may contain PII)

---------

Co-authored-by: ochafik <ochafik@google.com>
Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com>

May 25, 2025
f5cd27b
zip
tar.gz
Downloads

b5470

ci : enable winget package updates (ggml-org#13734)

May 23, 2025
b775345
zip
tar.gz
Downloads

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

b5537

b5500

b5497

b5495

b5494

b5493

b5488

b5479

b5478

b5470

Tags: ochafik/llama.cpp