-
Notifications
You must be signed in to change notification settings - Fork 12.4k
Closed
Labels
Description
Name and Version
$./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
version: 5329 (611aa914)
built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-cli
Command line
llama-cli \
--log-file /tmp/llamacpp-Qwen3-30B-A3B-Q8_K_XL.log \
--hf-repo unsloth/Qwen3-30B-A3B-GGUF:Q8_K_XL \
--override-tensor '([0-9]+).ffn_.*_exps.=CPU' \
--n-gpu-layers 48 \
--jinja \
--cache-type-k q8_0 \
--ctx-size 32768 \
--samplers "top_k;dry;min_p;temperature;top_p" \
--min-p 0.005 \
--top-p 0.97 \
--top-k 40 \
--temp 0.7 \
--dry-multiplier 0.7 \
--dry-allowed-length 4 \
--dry-penalty-last-n 2048 \
--presence-penalty 0.05 \
--frequency-penalty 0.005 \
--repeat-penalty 1.01 \
--repeat-last-n 16 \
--verbose \
--file generic-prompt-for-testing-1906words.txt
Problem description & steps to reproduce
The log file of the output, together with what I hope is all the relevant information can be found in this ephemeral repo I put up for this bug report:
https://github.com/bjodah/bug-reproducer-llamacpp-assert-triggering/tree/main
It might very well that I'm doing something awfully wrong here, but since it's an assert that is triggering, I'm thinking that you might be interested in a bug report?
I first observed this error using llama-serve on my laptop (ubuntu 24.04, geforce 1050 mobile), but everything in this bug report was reproduced on a more modern system (debian, geforce rtx 3090).
First Bad Commit
Qwen 3 support is pretty recent, so I haven't figured out what's the relevant oldest commit for a bisection.
Relevant log output
/... lots of output, see log file in repo linked in issue description .../
eval: [ 'G':38 ]
Gn_past = 2620
/home/bjorn/vc/llama.cpp/src/llama-sampling.cpp:204: GGML_ASSERT(cur_p->size > 0) failed
/home/bjorn/.gdbinit:2: Error in sourced command file:
/home/bjorn/dotfiles/per-file/.gdbinit:22: Error in sourced command file:
Scripting in the "Python" language is not supported in this copy of GDB.
ptrace: Operation not permitted.
No stack.
The program is not being run.