8000 Bug: ggml_metal_init error: zero-length arrays are not permitted in C++ float4x4 lo[D16/NW4]; · Issue #10208 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content
Bug: ggml_metal_init error: zero-length arrays are not permitted in C++ float4x4 lo[D16/NW4]; #10208
@a1ix2

Description

@a1ix2

What happened?

Trying to run a llama-server on Apple Silicon M2 running Ventura. Same error either using the latest release or building from source. I'm trying to load Llama-3.2-3B-Instruct F16 from Meta. I created the gguf using convert_hf_to_gguf.py.

$ ./llama-server -m Llama-3.2-3B-Instruct-F16.gguf --verbose

Name and Version

From source

./llama-cli --version
version: 4048 (a71d81c)
built with Apple clang version 15.0.0 (clang-1500.1.0.2.5) for arm64-apple-darwin22.6.0

From the release

$ ./llama-cli --version
version: 4044 (97404c4)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.6.0

What operating system are you seeing the problem on?

Mac

Relevant log output

ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2
ggml_metal_init: picking default device: Apple M2
ggml_metal_init: using embedded metal library
ggml_metal_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:5134:17: error: zero-length arrays are not permitted in C++
    float4x4 lo[D16/NW4];
                ^~~~~~~
program_source:5391:18: note: in instantiation of function template specialization 'kernel_flash_attn_ext_vec<metal::matrix<half, 4, 4, void>, 1, &dequantize_f16, 64, 1, 32>' requested here
typedef decltype(kernel_flash_attn_ext_vec<half4x4, 1, dequantize_f16, 64>) flash_attn_ext_vec_t;
                 ^
program_source:5186:21: error: zero-length arrays are not permitted in C++
        float4x4 mq[D16/NW4];
                    ^~~~~~~
" UserInfo={NSLocalizedDescription=program_source:5134:17: error: zero-length arrays are not permitted in C++
    float4x4 lo[D16/NW4];
                ^~~~~~~
program_source:5391:18: note: in instantiation of function template specialization 'kernel_flash_attn_ext_vec<metal::matrix<half, 4, 4, void>, 1, &dequantize_f16, 64, 1, 32>' requested here
typedef decltype(kernel_flash_attn_ext_vec<half4x4, 1, dequantize_f16, 64>) flash_attn_ext_vec_t;
                 ^
program_source:5186:21: error: zero-length arrays are not permitted in C++
        float4x4 mq[D16/NW4];
                    ^~~~~~~
}
ggml_backend_metal_device_init: error: failed to allocate context
llama_new_context_with_model: failed to initialize Metal backend
common_init_from_params: failed to create context with model '/Users/username/dev/RAG/Llama-3.2-3B-Instruct-F16.gguf'
srv    load_model: failed to load model, '/Users/username/dev/RAG/Llama-3.2-3B-Instruct-F16.gguf'
main: exiting due to model loading error

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug-unconfirmedcritical severityUsed to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0