Misc. bug: llama-sampling.cpp:204: GGML_ASSERT(cur_p->size > 0) failed

Name and Version

$./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
version: 5329 (611aa914)
built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-cli

Command line

llama-cli \
    --log-file /tmp/llamacpp-Qwen3-30B-A3B-Q8_K_XL.log \
    --hf-repo unsloth/Qwen3-30B-A3B-GGUF:Q8_K_XL \
    --override-tensor '([0-9]+).ffn_.*_exps.=CPU' \
    --n-gpu-layers 48 \
    --jinja \
    --cache-type-k q8_0 \
    --ctx-size 32768 \
    --samplers "top_k;dry;min_p;temperature;top_p" \
    --min-p 0.005 \
    --top-p 0.97 \
    --top-k 40 \
    --temp 0.7 \
    --dry-multiplier 0.7 \
    --dry-allowed-length 4 \
    --dry-penalty-last-n 2048 \
    --presence-penalty 0.05 \
    --frequency-penalty 0.005 \
    --repeat-penalty 1.01 \
    --repeat-last-n 16 \
    --verbose \
    --file generic-prompt-for-testing-1906words.txt

Problem description & steps to reproduce

The log file of the output, together with what I hope is all the relevant information can be found in this ephemeral repo I put up for this bug report:
https://github.com/bjodah/bug-reproducer-llamacpp-assert-triggering/tree/main

It might very well that I'm doing something awfully wrong here, but since it's an assert that is triggering, I'm thinking that you might be interested in a bug report?

I first observed this error using llama-serve on my laptop (ubuntu 24.04, geforce 1050 mobile), but everything in this bug report was reproduced on a more modern system (debian, geforce rtx 3090).

First Bad Commit

Qwen 3 support is pretty recent, so I haven't figured out what's the relevant oldest commit for a bisection.

Relevant log output

/... lots of output, see log file in repo linked in issue description .../ 
eval: [ 'G':38 ]
Gn_past = 2620
/home/bjorn/vc/llama.cpp/src/llama-sampling.cpp:204: GGML_ASSERT(cur_p->size > 0) failed
/home/bjorn/.gdbinit:2: Error in sourced command file:
/home/bjorn/dotfiles/per-file/.gdbinit:22: Error in sourced command file:
Scripting in the "Python" language is not supported in this copy of GDB.
ptrace: Operation not permitted.
No stack.
The program is not being run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions