8000 Misc. bug: llama-sampling.cpp:204: GGML_ASSERT(cur_p->size > 0) failed · Issue #13405 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content
Misc. bug: llama-sampling.cpp:204: GGML_ASSERT(cur_p->size > 0) failed #13405
@bjodah

Description

@bjodah

Name and Version

$./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
version: 5329 (611aa914)
built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-cli

Command line

llama-cli \
    --log-file /tmp/llamacpp-Qwen3-30B-A3B-Q8_K_XL.log \
    --hf-repo unsloth/Qwen3-30B-A3B-GGUF:Q8_K_XL \
    --override-tensor '([0-9]+).ffn_.*_exps.=CPU' \
    --n-gpu-layers 48 \
    --jinja \
    --cache-type-k q8_0 \
    --ctx-size 32768 \
    --samplers "top_k;dry;min_p;temperature;top_p" \
    --min-p 0.005 \
    --top-p 0.97 \
    --top-k 40 \
    --temp 0.7 \
    --dry-multiplier 0.7 \
    --dry-allowed-length 4 \
    --dry-penalty-last-n 2048 \
    --presence-penalty 0.05 \
    --frequency-penalty 0.005 \
    --repeat-penalty 1.01 \
    --repeat-last-n 16 \
    --verbose \
    --file generic-prompt-for-testing-1906words.txt

Problem description & steps to reproduce

The log file of the output, together with what I hope is all the relevant information can be found in this ephemeral repo I put up for this bug report:
https://github.com/bjodah/bug-reproducer-llamacpp-assert-triggering/tree/main

It might very well that I'm doing something awfully wrong here, but since it's an assert that is triggering, I'm thinking that you might be interested in a bug report?

I first observed this error using llama-serve on my laptop (ubuntu 24.04, geforce 1050 mobile), but everything in this bug report was reproduced on a more modern system (debian, geforce rtx 3090).

First Bad Commit

Qwen 3 support is pretty recent, so I haven't figured out what's the relevant oldest commit for a bisection.

Relevant log output

/... lots of output, see log file in repo linked in issue description .../ 
eval: [ 'G':38 ]
Gn_past = 2620
/home/bjorn/vc/llama.cpp/src/llama-sampling.cpp:204: GGML_ASSERT(cur_p->size > 0) failed
/home/bjorn/.gdbinit:2: Error in sourced command file:
/home/bjorn/dotfiles/per-file/.gdbinit:22: Error in sourced command file:
Scripting in the "Python" language is not supported in this copy of GDB.
ptrace: Operation not permitted.
No stack.
The program is not being run.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0