Closed
Description
Name and Version
Revision 9b169a4
Operating systems
Linux
GGML backends
CPU
Hardware
ARM Ampere
Models
Qwen2.5-14B-Instruct-1M-Q5_K_M
Problem description & steps to reproduce
Setting pooling_type = LLAMA_POOLING_TYPE_MEAN and calling llama_init_from_model() causes this crash:
/build/source/ggml/src/ggml.c:2738: GGML_ASSERT(ggml_can_mul_mat(a, b)) failed
Setting to LLAMA_POOLING_TYPE_LAST and changing nothing else works correctly.
First Bad Commit
No response
Relevant log output
/build/source/ggml/src/ggml.c:2738: GGML_ASSERT(ggml_can_mul_mat(a, b)) failed