Eval bug: Error in trying to use llama-server with Qwen3-Embedding-0.6B-GGUF

Name and Version

llama-cli --version
version: 5674 (d7da8dc)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

Tesla T4

Models

Qwen3-Embedding-0.6B-f16.gguf

Problem description & steps to reproduce

when I run the llama-embedding function like:

llama-embedding -m Qwen3-Embedding-0.6B-f16.gguf -e -p 'ABCDEF' --pooling last

I get outputs as expected.

But when I start a server:

llama-server -m Qwen3-Embedding-4B-Q4_K_M.gguf -ub 8192 --host 0.0.0.0 --port 8053 --embeddings --pooling last  --alias qwen3-embed:4b

and perform a CURL request:

curl -X POST "http://0.0.0.0:8053/v1/embeddings" --data '{"model": "qwen3-embed:4b", "input":"ABCDEF", "encoding_format": "float"}'
OR
curl -X POST "http://0.0.0.0:8053/v1/embeddings" --data '{"model": "qwen3-embed:4b", "input":["ABCDEF"], "encoding_format": "float"}'

I get a 500 error response with the message Invalid input batch:

{"error":{"code":500,"message":"Invalid input batch.","type":"server_error"}}

First Bad Commit

N/A

Relevant log output

slot launch_slot_: id  0 | task 15 | processing task
slot update_slots: id  0 | task 15 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 2
slot update_slots: id  0 | task 15 | kv cache rm [0, end)
slot update_slots: id  0 | task 15 | prompt processing progress, n_past = 2, n_tokens = 2, progress = 1.000000
slot update_slots: id  0 | task 15 | prompt done, n_past = 2, n_tokens = 2
decode: pooled embedding requires that all tokens are output (n_outputs_all = 1, n_tokens_all = 2)
llama_decode: failed to decode, ret = -1
srv  update_slots: Invalid input batch., i = 0, n_batch = 2048, ret = -1
slot      release: id  0 | task 15 | stop processing: n_past = 2, truncated = 0
srv    send_error: task id = 15, error: Invalid input batch.
srv  update_slots: all slots are idle
srv  cancel_tasks: cancel task, id_task = 15
srv  update_slots: all slots are idle
srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 500

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions