8000
We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
version: 5128 (3ec8878) built with Android (11349228, +pgo, +bolt, +lto, -mlgo, based on r487747e) clang version 17.0.2 (https://android.googlesource.com/toolchain/llvm-project d9f89f4d16663d5012e5c09495f3b30ece3d2362) for x86_64-unknown-linux-gnu
Other? (Please let us know in description)
CPU
Android selecting device: 'QUALCOMM Adreno(TM) 750 (OpenCL 3.0 Adreno(TM) 750)' OpenCL driver: OpenCL 3.0 QUALCOMM build: commit unknown Compiler E031.45.02.05
google/gemma-3-4b-it-qat-q4_0-gguf
I tried running gemma3-4b model on my android device, I came across sgement fault. Here is the step to reproduce the bug.
# build command PROJECT_DIR=$(pwd) THIRD_PARTY_DIR=${PROJECT_DIR}/third_party if [ ! -d $THIRD_PARTY_DIR ]; then mkdir $THIRD_PARTY_DIR fi cd $THIRD_PARTY_DIR if [ ! -d OpenCL-Headers ]; then echo "Directory OpenCL-Headers already exists, remove it." git clone https://github.com/KhronosGroup/OpenCL-Headers fi cd OpenCL-Headers && \ cp -r CL $ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include cd .. if [ ! -d OpenCL-ICD-Loader ]; then git clone https://github.com/KhronosGroup/OpenCL-ICD-Loader fi cd OpenCL-ICD-Loader if [ -d build_ndk ]; then echo "Directory build_ndk already exists, remove it." rm -rf build_ndk fi mkdir build_ndk cd build_ndk && \ cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \ -DOPENCL_ICD_LOADER_HEADERS_DIR=$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include \ -DANDROID_ABI=arm64-v8a \ -DANDROID_PLATFORM=24 \ -DANDROID_STL=c++_shared && \ ninja && \ cp libOpenCL.so $ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/lib/aarch64-linux-android cd ${PROJECT_DIR} if [ -d build_android ]; then echo "Directory build_android already exists, remove it." rm -rf build_android fi mkdir build_android cd build_android cmake .. -G Ninja \ -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \ -DANDROID_ABI=arm64-v8a \ -DANDROID_PLATFORM=android-28 \ -DBUILD_SHARED_LIBS=OFF \ -DGGML_OPENCL=ON \ -DGGML_OPENMP=OFF \ -DLLAMA_CURL=OFF ninja -j 16
2. push bins and model to device:
# push bin and model adb push .\build\bin\ /data/local/tmp adb push .\gemma-3-4b-it-qat-q4_0-gguf\ /data/local/tmp
3. execute the progam:
adb shell cd /data/local/tmp/bin chmod +x llama-gemma3-cli llama-gemma3-cli -m ../gemma-3-4b-it-qat-q4_0-gguf/gemma-3-4b-it-q4_0.gguf \ --mmproj ../gemma-3-4b-it-qat-q4_0-gguf/mmproj-model-f16-4B.gguf \ -p "Describe this image in detail" \ --image path/to/image
4. error output:
ggml_opencl: selecting platform: 'QUALCOMM Snapdragon(TM)' ggml_opencl: selecting device: 'QUALCOMM Adreno(TM) 750 (OpenCL 3.0 Adreno(TM) 750)' ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: commit unknown Compiler E031.45.02.05 ggml_opencl: vector subgroup broadcast support: false ggml_opencl: device FP16 support: true ggml_opencl: mem base addr align: 128 ggml_opencl: max mem alloc size: 1024 MB ggml_opencl: SVM coarse grain buffer support: true ggml_opencl: SVM fine grain buffer support: true ggml_opencl: SVM fine grain system support: false ggml_opencl: SVM atomics support: true ggml_opencl: flattening quantized weights representation as struct of arrays (GGML_OPENCL_SOA_Q) .... .... .... load_hparams: model size: 490604.59 MiB load_hparams: metadata size: 0.15 MiB ggml_backend_opencl_buffer_type_alloc_buffer: failed to allocate 1128.81 MiB ggml_gallocr_reserve_n: failed to allocate OpenCL buffer of size 1183645696 main: ../gemma-3-4b-it-qat-q4_0-gguf/gemma-3-4b-it-q4_0.gguf encoding image... Segmentation fault
Does anyone met the same bug? Any help would be appreciate~
Thanks
No response
Full log output: ggml_opencl: selecting platform: 'QUALCOMM Snapdragon(TM)' ggml_opencl: selecting device: 'QUALCOMM Adreno(TM) 750 (OpenCL 3.0 Adreno(TM) 750)' ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: commit unknown Compiler E031.45.02.05 ggml_opencl: vector subgroup broadcast support: false ggml_opencl: device FP16 support: true ggml_opencl: mem base addr align: 128 ggml_opencl: max mem alloc size: 1024 MB ggml_opencl: SVM coarse grain buffer support: true ggml_opencl: SVM fine grain buffer support: true ggml_opencl: SVM fine grain system support: false ggml_opencl: SVM atomics support: true ggml_opencl: flattening quantized weights representation as struct of arrays (GGML_OPENCL_SOA_Q) ggml_opencl: using kernels optimized for Adreno (GGML_OPENCL_USE_ADRENO_KERNELS) build: 5128 (3ec88782) with Android (11349228, +pgo, +bolt, +lto, -mlgo, based on r487747e) clang version 17.0.2 (https://android.googlesource.com/toolchain/llvm-project d9f89f4d16663d5012e5c09495f3b30ece3d2362) for x86_64-unknown-linux-gnu llama_model_load_from_file_impl: using device GPUOpenCL (QUALCOMM Adreno(TM) 750) - 0 MiB free llama_model_loader: loaded meta data with 39 key-value pairs and 444 tensors from ../gemma-3-4b-it-qat-q4_0-gguf/gemma-3-4b-it-q4_0.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = gemma3 llama_model_loader: - kv 1: gemma3.context_length u32 = 131072 llama_model_loader: - kv 2: gemma3.block_count u32 = 34 llama_model_loader: - kv 3: gemma3.embedding_length u32 = 2560 llama_model_loader: - kv 4: gemma3.feed_forward_length u32 = 10240 llama_model_loader: - kv 5: gemma3.attention.head_count u32 = 8 llama_model_loader: - kv 6: gemma3.attention.head_count_kv u32 = 4 llama_model_loader: - kv 7: gemma3.attention.key_length u32 = 256 llama_model_loader: - kv 8: gemma3.attention.value_length u32 = 256 llama_model_loader: - kv 9: gemma3.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 10: gemma3.rope.scaling.type str = linear llama_model_loader: - kv 11: gemma3.rope.scaling.factor f32 = 8.000000 llama_model_loader: - kv 12: gemma3.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 13: gemma3.attention.sliding_window u32 = 1024 llama_model_loader: - kv 14: tokenizer.ggml.model str = llama llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32 = 2 llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 1 llama_model_loader: - kv 17: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 3 llama_model_loader: - kv 19: tokenizer.ggml.tokens arr[str,262144] = ["<pad>", "<eos>", "<bos>", "<unk>", ... llama_model_loader: - kv 20: tokenizer.ggml.scores arr[f32,262144] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 21: tokenizer.ggml.token_type arr[i32,262144] = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 22: general.quantization_version u32 = 2 llama_model_loader: - kv 23: general.file_type u32 = 2 llama_model_loader: - kv 24: tokenizer.chat_template str = {{ bos_token }} {%- if messages[0]['r... llama_model_loader: - kv 25: gemma3.mm.tokens_per_image u32 = 256 llama_model_loader: - kv 26: gemma3.vision.attention.head_count u32 = 16 llama_model_loader: - kv 27: gemma3.vision.attention.layer_norm_epsilon f32 = 0.000001 llama_model_loader: - kv 28: gemma3.vision.block_count u32 = 27 llama_model_loader: - kv 29: gemma3.vision.embedding_length u32 = 1152 llama_model_loader: - kv 30: gemma3.vision.feed_forward_length u32 = 4304 llama_model_loader: - kv 31: gemma3.vision.image_size u32 = 896 llama_model_loader: - kv 32: gemma3.vision.num_channels u32 = 3 llama_model_loader: - kv 33: gemma3.vision.patch_size u32 = 14 llama_model_loader: - kv 34: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 35: tokenizer.ggml.add_eos_token bool = false llama_model_loader: - kv 36: tokenizer.ggml.add_padding_token bool = false llama_model_loader: - kv 37: tokenizer.ggml.add_unknown_token bool = false llama_model_loader: - kv 38: tokenizer.ggml.pre str = default llama_model_loader: - type f32: 205 tensors llama_model_loader: - type f16: 1 tensors llama_model_loader: - type q4_0: 238 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_0 print_info: file size = 2.93 GiB (6.49 BPW) load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: special tokens cache size = 8 load: token to piece cache size = 1.9446 MB print_info: arch = gemma3 print_info: vocab_only = 0 print_info: n_ctx_train = 131072 print_info: n_embd = 2560 print_info: n_layer = 34 print_info: n_head = 8 print_info: n_head_kv = 4 print_info: n_rot = 256 print_info: n_swa = 1024 print_info: n_swa_pattern = 6 print_info: n_embd_head_k = 256 print_info: n_embd_head_v = 256 print_info: n_gqa = 2 print_info: n_embd_k_gqa = 1024 print_info: n_embd_v_gqa = 1024 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-06 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 6.2e-02 print_info: n_ff = 10240 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 1 print_info: pooling type = 0 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 1000000.0 print_info: freq_scale_train = 0.125 print_info: n_ctx_orig_yarn = 131072 print_info: rope_finetuned = unknown print_info: ssm_d_conv = 0 print_info: ssm_d_inner = 0 print_info: ssm_d_state = 0 print_info: ssm_dt_rank = 0 print_info: ssm_dt_b_c_rms = 0 print_info: model type = 4B print_info: model params = 3.88 B print_info: general.name = n/a print_info: vocab type = SPM print_info: n_vocab = 262144 print_info: n_merges = 0 print_info: BOS token = 2 '<bos>' print_info: EOS token = 1 '<eos>' print_info: EOT token = 106 '<end_of_turn>' print_info: UNK token = 3 '<unk>' print_info: PAD token = 0 '<pad>' print_info: LF token = 248 '<0x0A>' print_info: EOG token = 1 '<eos>' print_info: EOG token = 106 '<end_of_turn>' print_info: max token length = 93 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: offloading 0 repeating layers to GPU load_tensors: offloaded 0/35 layers to GPU load_tensors: CPU_Mapped model buffer size = 3002.65 MiB ........................................................... llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 4096 llama_context: n_ctx_per_seq = 4096 llama_context: n_batch = 2048 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = 0 llama_context: freq_base = 1000000.0 llama_context: freq_scale = 0.125 llama_context: n_ctx_per_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized llama_context: CPU output buffer size = 1.00 MiB init: kv_size = 4096, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 34, can_shift = 1 init: CPU KV buffer size = 544.00 MiB llama_context: KV self size = 544.00 MiB, K (f16): 272.00 MiB, V (f16): 272.00 MiB llama_context: CPU compute buffer size = 517.00 MiB llama_context: graph nodes = 1435 llama_context: graph splits = 1 common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096 common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable) clip_ctx: CLIP using OpenCL backend clip_model_loader: model name: clip_model_loader: description: clip_model_loader: GGUF version: 3 clip_model_loader: alignment: 32 clip_model_loader: n_tensors: 439 clip_model_loader: n_kv: 16 load_hparams: text_encoder: 0 load_hparams: vision_encoder: 1 load_hparams: llava_projector: 0 load_hparams: minicpmv_projector: 0 load_hparams: minicpmv_version: 2 load_hparams: glm_projector: 0 load_hparams: model size: 490604.59 MiB load_hparams: metadata size: 0.15 MiB ggml_backend_opencl_buffer_type_alloc_buffer: failed to allocate 1128.81 MiB ggml_gallocr_reserve_n: failed to allocate OpenCL buffer of size 1183645696 main: ../gemma-3-4b-it-qat-q4_0-gguf/gemma-3-4b-it-q4_0.gguf encoding image... Segmentation fault
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Name and Version
version: 5128 (3ec8878)
built with Android (11349228, +pgo, +bolt, +lto, -mlgo, based on r487747e) clang version 17.0.2 (https://android.googlesource.com/toolchain/llvm-project d9f89f4d16663d5012e5c09495f3b30ece3d2362) for x86_64-unknown-linux-gnu
Operating systems
Other? (Please let us know in description)
GGML backends
CPU
Hardware
Android
selecting device: 'QUALCOMM Adreno(TM) 750 (OpenCL 3.0 Adreno(TM) 750)'
OpenCL driver: OpenCL 3.0 QUALCOMM build: commit unknown Compiler E031.45.02.05
Models
google/gemma-3-4b-it-qat-q4_0-gguf
Problem description & steps to reproduce
I tried running gemma3-4b model on my android device, I came across sgement fault. Here is the step to reproduce the bug.
2. push bins and model to device:
3. execute the progam:
4. error output:
Does anyone met the same bug? Any help would be appreciate~
Thanks
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: