Fix GLM4 incoherence with fp16 accumulators #13639

0cc4m · 2025-05-19T17:10:15Z

This fixes GLM4-32B on Vulkan in combination with #13607

…to fix infinity values in output (ggml-org#13639)

0cc4m mentioned this pull request May 20, 2025

glm-4-0414-32b not work on Vulkan + 2080Ti foldl/chatllm.cpp#56

Open

0cc4m requested a review from ggerganov May 20, 2025 06:51

ggerganov approved these changes May 20, 2025

View reviewed changes

0cc4m merged commit c9c64de into master May 20, 2025
46 checks passed

infil00p pushed a commit to baseweight/llama.cpp that referenced this pull request May 22, 2025

Set GLM4 blk.*.attn_output.weight, kqv_out-* matmul to GGML_PREC_F32 …

a31f685

…to fix infinity values in output (ggml-org#13639)

0cc4m deleted the 0cc4m/fix-vulkan-glm4 branch May 24, 2025 13:26

0cc4m mentioned this pull request May 24, 2025

Move GLM4 f32 attention fix to the correct function #13750

Merged

jeffbolznv mentioned this pull request May 25, 2025

Eval bug: GLM-Z1-9B-0414 #12946

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix GLM4 incoherence with fp16 accumulators #13639

Fix GLM4 incoherence with fp16 accumulators #13639

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix GLM4 incoherence with fp16 accumulators #13639

Fix GLM4 incoherence with fp16 accumulators #13639

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!