Bug: ggml_metal_init error: zero-length arrays are not permitted in C++ float4x4 lo[D16/NW4];

What happened?

Trying to run a llama-server on Apple Silicon M2 running Ventura. Same error either using the latest release or building from source. I'm trying to load Llama-3.2-3B-Instruct F16 from Meta. I created the gguf using convert_hf_to_gguf.py.

$ ./llama-server -m Llama-3.2-3B-Instruct-F16.gguf --verbose

Name and Version

From source

./llama-cli --version
version: 4048 (a71d81c)
built with Apple clang version 15.0.0 (clang-1500.1.0.2.5) for arm64-apple-darwin22.6.0

From the release

$ ./llama-cli --version
version: 4044 (97404c4)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.6.0

What operating system are you seeing the problem on?

Mac

Relevant log output

ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2
ggml_metal_init: picking default device: Apple M2
ggml_metal_init: using embedded metal library
ggml_metal_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:5134:17: error: zero-length arrays are not permitted in C++
    float4x4 lo[D16/NW4];
                ^~~~~~~
program_source:5391:18: note: in instantiation of function template specialization 'kernel_flash_attn_ext_vec<metal::matrix<half, 4, 4, void>, 1, &dequantize_f16, 64, 1, 32>' requested here
typedef decltype(kernel_flash_attn_ext_vec<half4x4, 1, dequantize_f16, 64>) flash_attn_ext_vec_t;
                 ^
program_source:5186:21: error: zero-length arrays are not permitted in C++
        float4x4 mq[D16/NW4];
                    ^~~~~~~
" UserInfo={NSLocalizedDescription=program_source:5134:17: error: zero-length arrays are not permitted in C++
    float4x4 lo[D16/NW4];
                ^~~~~~~
program_source:5391:18: note: in instantiation of function template specialization 'kernel_flash_attn_ext_vec<metal::matrix<half, 4, 4, void>, 1, &dequantize_f16, 64, 1, 32>' requested here
typedef decltype(kernel_flash_attn_ext_vec<half4x4, 1, dequantize_f16, 64>) flash_attn_ext_vec_t;
                 ^
program_source:5186:21: error: zero-length arrays are not permitted in C++
        float4x4 mq[D16/NW4];
                    ^~~~~~~
}
ggml_backend_metal_device_init: error: failed to allocate context
llama_new_context_with_model: failed to initialize Metal backend
common_init_from_params: failed to create context with model '/Users/username/dev/RAG/Llama-3.2-3B-Instruct-F16.gguf'
srv    load_model: failed to load model, '/Users/username/dev/RAG/Llama-3.2-3B-Instruct-F16.gguf'
main: exiting due to model loading error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What happened?

Name and Version

From source

From the release

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

What happened?

Name and Version

From source

From the release

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions