8000 Eval bug: Segmentation fault when running gemma3-cli on Android · Issue #13000 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content

Eval bug: Segmentation fault when running gemma3-cli on Android #13000

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Russyyds opened this issue Apr 18, 2025 · 0 comments
Open

Eval bug: Segmentation fault when running gemma3-cli on Android #13000

Russyyds opened this issue Apr 18, 2025 · 0 comments

Comments

@Russyyds
Copy link
Contributor

Name and Version

version: 5128 (3ec8878)
built with Android (11349228, +pgo, +bolt, +lto, -mlgo, based on r487747e) clang version 17.0.2 (https://android.googlesource.com/toolchain/llvm-project d9f89f4d16663d5012e5c09495f3b30ece3d2362) for x86_64-unknown-linux-gnu

Operating systems

Other? (Please let us know in description)

GGML backends

CPU

Hardware

Android
selecting device: 'QUALCOMM Adreno(TM) 750 (OpenCL 3.0 Adreno(TM) 750)'
OpenCL driver: OpenCL 3.0 QUALCOMM build: commit unknown Compiler E031.45.02.05

Models

google/gemma-3-4b-it-qat-q4_0-gguf

Problem description & steps to reproduce

I tried running gemma3-4b model on my android device, I came across sgement fault. Here is the step to reproduce the bug.

  1. Build the binaries:
# build command
PROJECT_DIR=$(pwd)
THIRD_PARTY_DIR=${PROJECT_DIR}/third_party

if [ ! -d $THIRD_PARTY_DIR ]; then
  mkdir $THIRD_PARTY_DIR
fi

cd $THIRD_PARTY_DIR
if [ ! -d OpenCL-Headers ]; then
  echo "Directory OpenCL-Headers already exists, remove it."
  git clone https://github.com/KhronosGroup/OpenCL-Headers
fi
cd OpenCL-Headers && \
cp -r CL $ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include

cd ..
if [ ! -d OpenCL-ICD-Loader ]; then
    git clone https://github.com/KhronosGroup/OpenCL-ICD-Loader
fi

cd OpenCL-ICD-Loader
if [ -d build_ndk ]; then
  echo "Directory build_ndk already exists, remove it."
  rm -rf build_ndk
fi

mkdir build_ndk
cd build_ndk && \
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
  -DOPENCL_ICD_LOADER_HEADERS_DIR=$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include \
  -DANDROID_ABI=arm64-v8a \
  -DANDROID_PLATFORM=24 \
  -DANDROID_STL=c++_shared && \
ninja && \
cp libOpenCL.so $ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/lib/aarch64-linux-android

cd ${PROJECT_DIR}
if [ -d build_android ]; then
  echo "Directory build_android already exists, remove it."
  rm -rf build_android
fi
mkdir build_android
cd build_android

cmake .. -G Ninja \
  -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
  -DANDROID_ABI=arm64-v8a \
  -DANDROID_PLATFORM=android-28 \
  -DBUILD_SHARED_LIBS=OFF \
  -DGGML_OPENCL=ON \
  -DGGML_OPENMP=OFF \
  -DLLAMA_CURL=OFF

ninja -j 16

2. push bins and model to device:

# push bin and model
adb push .\build\bin\ /data/local/tmp
adb push .\gemma-3-4b-it-qat-q4_0-gguf\ /data/local/tmp

3. execute the progam:

adb shell
cd /data/local/tmp/bin
chmod +x llama-gemma3-cli
llama-gemma3-cli -m ../gemma-3-4b-it-qat-q4_0-gguf/gemma-3-4b-it-q4_0.gguf \
                --mmproj ../gemma-3-4b-it-qat-q4_0-gguf/mmproj-model-f16-4B.gguf \
                -p "Describe this image in detail" \
                --image path/to/image

4. error output:

ggml_opencl: selecting platform: 'QUALCOMM Snapdragon(TM)'
ggml_opencl: selecting device: 'QUALCOMM Adreno(TM) 750 (OpenCL 3.0 Adreno(TM) 750)'
ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: commit unknown Compiler E031.45.02.05
ggml_opencl: vector subgroup broadcast support: false
ggml_opencl: device FP16 support: true
ggml_opencl: mem base addr align: 128
ggml_opencl: max mem alloc size: 1024 MB
ggml_opencl: SVM coarse grain buffer support: true
ggml_opencl: SVM fine grain buffer support: true
ggml_opencl: SVM fine grain system support: false
ggml_opencl: SVM atomics support: true
ggml_opencl: flattening quantized weights representation as struct of arrays (GGML_OPENCL_SOA_Q)
....
....
....
load_hparams: model size:         490604.59 MiB
load_hparams: metadata size:      0.15 MiB
ggml_backend_opencl_buffer_type_alloc_buffer: failed to allocate 1128.81 MiB
ggml_gallocr_reserve_n: failed to allocate OpenCL buffer of size 1183645696
main: ../gemma-3-4b-it-qat-q4_0-gguf/gemma-3-4b-it-q4_0.gguf
encoding image...
Segmentation fault

Does anyone met the same bug? Any help would be appreciate~

Thanks

First Bad Commit

No response

Relevant log output

Full log output:

ggml_opencl: selecting platform: 'QUALCOMM Snapdragon(TM)'
ggml_opencl: selecting device: 'QUALCOMM Adreno(TM) 750 (OpenCL 3.0 Adreno(TM) 750)'
ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: commit unknown Compiler E031.45.02.05
ggml_opencl: vector subgroup broadcast support: false
ggml_opencl: device FP16 support: true
ggml_opencl: mem base addr align: 128
ggml_opencl: max mem alloc size: 1024 MB
ggml_opencl: SVM coarse grain buffer support: true
ggml_opencl: SVM fine grain buffer support: true
ggml_opencl: SVM fine grain system support: false
ggml_opencl: SVM atomics support: true
ggml_opencl: flattening quantized weights representation as struct of arrays (GGML_OPENCL_SOA_Q)
ggml_opencl: using kernels optimized for Adreno (GGML_OPENCL_USE_ADRENO_KERNELS)
build: 5128 (3ec88782) with Android (11349228, +pgo, +bolt, +lto, -mlgo, based on r487747e) clang version 17.0.2 (https://android.googlesource.com/toolchain/llvm-project d9f89f4d16663d5012e5c09495f3b30ece3d2362) for x86_64-unknown-linux-gnu
llama_model_load_from_file_impl: using device GPUOpenCL (QUALCOMM Adreno(TM) 750) - 0 MiB free
llama_model_loader: loaded meta data with 39 key-value pairs and 444 tensors from ../gemma-3-4b-it-qat-q4_0-gguf/gemma-3-4b-it-q4_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma3
llama_model_loader: - kv   1:                      gemma3.context_length u32              = 131072
llama_model_loader: - kv   2:                         gemma3.block_count u32              = 34
llama_model_loader: - kv   3:                    gemma3.embedding_length u32              = 2560
llama_model_loader: - kv   4:                 gemma3.feed_forward_length u32              = 10240
llama_model_loader: - kv   5:                gemma3.attention.head_count u32              = 8
llama_model_loader: - kv   6:             gemma3.attention.head_count_kv u32              = 4
llama_model_loader: - kv   7:                gemma3.attention.key_length u32              = 256
llama_model_loader: - kv   8:              gemma3.attention.value_length u32              = 256
llama_model_loader: - kv   9:    gemma3.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  10:                   gemma3.rope.scaling.type str              = linear
llama_model_loader: - kv  11:                 gemma3.rope.scaling.factor f32              = 8.000000
llama_model_loader: - kv  12:                      gemma3.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  13:            gemma3.attention.sliding_window u32              = 1024
llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  17:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  19:                      tokenizer.ggml.tokens arr[str,262144]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv  20:                      tokenizer.ggml.scores arr[f32,262144]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  21:                  tokenizer.ggml.token_type arr[i32,262144]  = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  22:               general.quantization_version u32              = 2
llama_model_loader: - kv  23:                          general.file_type u32              = 2
llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {{ bos_token }} {%- if messages[0]['r...
llama_model_loader: - kv  25:                 gemma3.mm.tokens_per_image u32              = 256
llama_model_loader: - kv  26:         gemma3.vision.attention.head_count u32              = 16
llama_model_loader: - kv  27: gemma3.vision.attention.layer_norm_epsilon f32              = 0.000001
llama_model_loader: - kv  28:                  gemma3.vision.block_count u32              = 27
llama_model_loader: - kv  29:             gemma3.vision.embedding_length u32              = 1152
llama_model_loader: - kv  30:          gemma3.vision.feed_forward_length u32              = 4304
llama_model_loader: - kv  31:                   gemma3.vision.image_size u32              = 896
llama_model_loader: - kv  32:                 gemma3.vision.num_channels u32              = 3
llama_model_loader: - kv  33:                   gemma3.vision.patch_size u32              = 14
llama_model_loader: - kv  34:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  35:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  36:           tokenizer.ggml.add_padding_token bool             = false
llama_model_loader: - kv  37:           tokenizer.ggml.add_unknown_token bool             = false
llama_model_loader: - kv  38:                         tokenizer.ggml.pre str              = default
llama_model_loader: - type  f32:  205 tensors
llama_model_loader: - type  f16:    1 tensors
llama_model_loader: - type q4_0:  238 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_0
print_info: file size   = 2.93 GiB (6.49 BPW)
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 8
load: token to piece cache size = 1.9446 MB
print_info: arch             = gemma3
print_info: vocab_only       = 0
print_info: n_ctx_train      = 131072
print_info: n_embd           = 2560
print_info: n_layer          = 34
print_info: n_head           = 8
print_info: n_head_kv        = 4
print_info: n_rot            = 256
print_info: n_swa            = 1024
print_info: n_swa_pattern    = 6
print_info: n_embd_head_k    = 256
print_info: n_embd_head_v    = 256
print_info: n_gqa            = 2
print_info: n_embd_k_gqa     = 1024
print_info: n_embd_v_gqa     = 1024
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-06
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 6.2e-02
print_info: n_ff             = 10240
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000000.0
print_info: freq_scale_train = 0.125
print_info: n_ctx_orig_yarn  = 131072
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 4B
print_info: model params     = 3.88 B
print_info: general.name     = n/a
print_info: vocab type       = SPM
print_info: n_vocab          = 262144
print_info: n_merges         = 0
print_info: BOS token        = 2 '<bos>'
print_info: EOS token        = 1 '<eos>'
print_info: EOT token        = 106 '<end_of_turn>'
print_info: UNK token        = 3 '<unk>'
print_info: PAD token        = 0 '<pad>'
print_info: LF token         = 248 '<0x0A>'
print_info: EOG token        = 1 '<eos>'
print_info: EOG token        = 106 '<end_of_turn>'
print_info: max token length = 93
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: offloading 0 repeating layers to GPU
load_tensors: offloaded 0/35 layers to GPU
load_tensors:   CPU_Mapped model buffer size =  3002.65 MiB
...........................................................
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 4096
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch       = 2048
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = 0
llama_context: freq_base     = 1000000.0
llama_context: freq_scale    = 0.125
llama_context: n_ctx_per_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_context:        CPU  output buffer size =     1.00 MiB
init: kv_size = 4096, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 34, can_shift = 1
init:        CPU KV buffer size =   544.00 MiB
llama_context: KV self size  =  544.00 MiB, K (f16):  272.00 MiB, V (f16):  272.00 MiB
llama_context:        CPU compute buffer size =   517.00 MiB
llama_context: graph nodes  = 1435
llama_context: graph splits = 1
common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
clip_ctx: CLIP using OpenCL backend
clip_model_loader: model name:
clip_model_loader: description:
clip_model_loader: GGUF version: 3
clip_model_loader: alignment:    32
clip_model_loader: n_tensors:    439
clip_model_loader: n_kv:         16

load_hparams: text_encoder:       0
load_hparams: vision_encoder:     1
load_hparams: llava_projector:    0
load_hparams: minicpmv_projector: 0
load_hparams: minicpmv_version:   2
load_hparams: glm_projector:      0
load_hparams: model size:         490604.59 MiB
load_hparams: metadata size:      0.15 MiB
ggml_backend_opencl_buffer_type_alloc_buffer: failed to allocate 1128.81 MiB
ggml_gallocr_reserve_n: failed to allocate OpenCL buffer of size 1183645696
main: ../gemma-3-4b-it-qat-q4_0-gguf/gemma-3-4b-it-q4_0.gguf
encoding image...
Segmentation fault
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant
0