8000 Eval bug: Bad output from Qwen3-Embedding-0.6B · Issue #14234 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content
Eval bug: Bad output from Qwen3-Embedding-0.6B #14234
Closed
@electroglyph

Description

@electroglyph

Name and Version

load_backend: loaded RPC backend from C:\Users\ANON\repos\AI_Grotto\llama.cpp\Windows\VULKAN\ggml-rpc.dll
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) A770 Graphics (Intel Corporation) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from C:\Users\ANON\repos\AI_Grotto\llama.cpp\Windows\VULKAN\ggml-vulkan.dll
load_backend: loaded CPU backend from C:\Users\ANON\repos\AI_Grotto\llama.cpp\Windows\VULKAN\ggml-cpu-icelake.dll
version: 5686 (e434e69)
built with clang version 18.1.8 for x86_64-pc-windows-msvc

Operating systems

Windows

GGML backends

Vulkan

Hardware

Ryzen 7900X + Intel A770

Models

https://huggingface.co/Qwen/Qwen3-Embedding-0.6B

Problem description & steps to reproduce

GGUF model produces bad output.

I created GGUF using script from: #14029

model served with llama-server.exe -t 4 --threads-http 4 --mlock -m Qwen3-Embedding-0.6B.gguf --embedding --port 7777 -ngl 99 --pooling last --ctx-size 9600

tested with this code:

payload = {
    "model": MODEL,
    "input": ["test"],
    "encoding_format": "float",
    "stream": False,
}
r = requests.post("http://127.0.0.1:7777/v1/embeddings", headers=HEADERS, json=payload)

first few values of the resulting 1024 dim tensor:

GGUF:

[0.025158854201436043, -0.10020897537469864, -0.00028968797414563596, -0.20551267266273499, 0.03785787895321846, 0.04945392906665802, 0.008871221914887428, 0.11061177402734756, 0.0017989154439419508, 0.00449336925521493, -0.018099267035722733, -0.040335990488529205, 0.03982189670205116, 0.002805062336847186, -0.07392510026693344, 0.03728350251913071, -0.048328712582588196, 0.03630823269486427, -0.06833038479089737, 0.0096778878942132, -0.008343852125108242, -0.008166425861418247, -0.007611222565174103, 0.052011676132678986, -0.0023417570628225803, -0.018826134502887726, -0.013801317662000656, -0.1342974156141281, 

original safetensors after last token pooling and normalize:

[-0.016528764739632607, -0.04132223129272461, -0.013776463456451893, -0.043908167630434036, 0.01608169451355934, 0.03396221250295639, 0.005219031125307083, 0.04472528398036957, -0.03631211444735527, 0.02987431362271309, -0.02959362417459488, -0.07498198002576828, 0.11239450424909592, -0.012434378266334534, -0.06438827514648438, 0.09462910145521164, -0.07114817202091217, 0.013470339588820934, 0.09129080176353455, -0.026682304218411446, 0.011281301267445087, 0.011008176021277905, -0.0647004097700119, 0.11493197828531265, -0.024359505623579025, 0.04530182108283043, -0.06911785155534744, 0.11223862320184708, 

my onnx model:

[-0.016528736799955368, -0.04132211580872536, -0.013776503503322601, -0.04390852153301239, 0.016081809997558594, 0.033962178975343704, 0.005219005048274994, 0.04472558572888374, -0.036312002688646317, 0.029874227941036224, -0.02959357015788555, -0.0749819278717041, 0.11239428073167801, -0.012434416450560093, -0.06438836455345154, 0.09462911635637283, -0.0711478441953659, 0.01347041130065918, 0.0912906676530838, -0.02668222226202488, 0.011281074024736881, 0.011008230037987232, -0.06470032036304474, 0.11493247002363205, -0.024359500035643578, 0.04530187323689461, -0.06911778450012207, 0.11223872750997543, 

also, benchmarking the GGUF on retrieval results in accuracy rating about 20% below what it should be. are my settings or usage incorrect?

First Bad Commit

No response

Relevant log output

n/a

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0