Description
Name and Version
./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA A10, compute capability 8.6, VMM: yes
version: 5225 (a0f7016)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-quantize
Command line
cd build/bin
./llama-quantize /mnt/data/train_output/Qwen2.5-32B-f16.gguf /mnt/data/train_output/Qwen2.5-32B-Q4_K_M.gguf Q4_K_M
Problem description & steps to reproduce
the quantize process has been finished successfully. but the output file can not be loaded by the following command:
./llama-cli -m /mnt/data/train_output/Qwen2.5-32B-Q4_K_M.gguf -n 128 --color -ngl 35
the error response like this:
gguf_init_from_file_impl: invalid magic characters: '', expected 'GGUF'
llama_model_load: error loading model: llama_model_loader: failed to load model from /mnt/data/train_output/Qwen2.5-32B-Q4_K_M.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/mnt/data/train_output/Qwen2.5-32B-Q4_K_M.gguf'
main: error: unable to load model
And I check the header data of this gguf file, find out there is not GGUF header, there is a lot of zero bytes at the beginning of gguf file
I also checked the source code of quantize.cpp,there is no code about outputing gguf format header at all.
First Bad Commit
No response