-
Notifications
You must be signed in to change notification settings - Fork 12.4k
Closed
Closed
Copy link
Labels
Description
Name and Version
llama-cli --version
version: 5674 (d7da8dc)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
Tesla T4
Models
Qwen3-Embedding-0.6B-f16.gguf
Problem description & steps to reproduce
when I run the llama-embedding function like:
llama-embedding -m Qwen3-Embedding-0.6B-f16.gguf -e -p 'ABCDEF' --pooling last
I get outputs as expected.
But when I start a server:
llama-server -m Qwen3-Embedding-4B-Q4_K_M.gguf -ub 8192 --host 0.0.0.0 --port 8053 --embeddings --pooling last --alias qwen3-embed:4b
and perform a CURL request:
curl -X POST "http://0.0.0.0:8053/v1/embeddings" --data '{"model": "qwen3-embed:4b", "input":"ABCDEF", "encoding_format": "float"}'
OR
curl -X POST "http://0.0.0.0:8053/v1/embeddings" --data '{"model": "qwen3-embed:4b", "input":["ABCDEF"], "encoding_format": "float"}'
I get a 500 error response with the message Invalid input batch:
{"error":{"code":500,"message":"Invalid input batch.","type":"server_error"}}
First Bad Commit
N/A
Relevant log output
slot launch_slot_: id 0 | task 15 | processing task
slot update_slots: id 0 | task 15 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 2
slot update_slots: id 0 | task 15 | kv cache rm [0, end)
slot update_slots: id 0 | task 15 | prompt processing progress, n_past = 2, n_tokens = 2, progress = 1.000000
slot update_slots: id 0 | task 15 | prompt done, n_past = 2, n_tokens = 2
decode: pooled embedding requires that all tokens are output (n_outputs_all = 1, n_tokens_all = 2)
llama_decode: failed to decode, ret = -1
srv update_slots: Invalid input batch., i = 0, n_batch = 2048, ret = -1
slot release: id 0 | task 15 | stop processing: n_past = 2, truncated = 0
srv send_error: task id = 15, error: Invalid input batch.
srv update_slots: all slots are idle
srv cancel_tasks: cancel task, id_task = 15
srv update_slots: all slots are idle
srv log_server_r: request: POST /v1/embeddings 127.0.0.1 500