Eval bug: Long text numbers/words in prompt breaks llama.cpp permanently in parallel mode with flash attention · Issue #12758 · ggml-org/llama.cpp · GitHub
You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
in my case: fails with -np 3 -fa, works ok with -np 1 -fa and with -np 3 (without -fa)
Symptoms:
After executing prompt which contains really long number, all subsequent prompt executions will execute "forever" without correct answer (by "forever" I mean until number of tokens to predict is reached)
The symptoms do not fix themselves, llama server restart is the only solution
I also checked old versions back to the first supporting gemma3 (b4997, b4923, b4898, b4875) - all have the error
jj123451
changed the title
Eval bug: Long text numbers in prompt breaks llama.cpp permanently in parallel mode
Eval bug: Long text numbers/words in prompt breaks llama.cpp permanently in parallel mode with flash attention
Apr 4, 2025
Uh oh!
There was an error while loading. Please reload this page.
Name and Version
version: 5050 (23106f9)
built with MSVC 19.29.30158.0 for
(https://github.com/ggml-org/llama.cpp/releases/download/b5050/llama-b5050-bin-win-cuda-cu12.4-x64.zip)
Operating systems
Windows
GGML backends
CUDA
Hardware
i5 13600K + RTX 4080
Models
google_gemma-3-12b-it-IQ4_XS
(https://huggingface.co/bartowski/google_gemma-3-12b-it-GGUF/blob/main/google_gemma-3-12b-it-IQ4_XS.gguf)
Problem description & steps to reproduce
Prerequisite:
-np 3 -fa
, works ok with-np 1 -fa
and with-np 3
(without-fa
)Symptoms:
Recreate steps:
llama-server.exe -m "google_gemma-3-12b-it-IQ4_XS.gguf" --port 8087 --api-key "empty" -n 1000 -fa -lv 0 -ngl 999 -c 21000 -np 3 --temp 1.0 --top-k 64 --top-p 0.95
test.txt, e.g.
uv venv . --python 3.12
uv pip install openai
Scripts\activate
python.exe .\test.py
Behaviour:
First Bad Commit
Not sure, I started playing with parallel execution recently. It might be here from the very beginning.
Relevant log output
The text was updated successfully, but these errors were encountered: