Description
I tried to build on Android device with GPU env but fail at official documents.
1.Termux env
2.openCL
I blocked here:
So, I changed to another build method as below:
1.
using termux default cmake tool does not ninja
2.
cmake .. -DCMAKE_BUILD_TYPE=Release
-DOPENCL_ICD_LOADER_HEADERS_DIR=/data/data/com.termux/files/usr/include
-DCMAKE_C_COMPILER=/data/data/com.termux/files/usr/bin/clang
-DCMAKE_CXX_COMPILER=/data/data/com.termux/files/usr/bin/clang++
-DCMAKE_C_FLAGS="--target=aarch64-linux-android24 -D_POSIX_C_SOURCE=200809L"
-DCMAKE_CXX_FLAGS="--target=aarch64-linux-android24 -D_POSIX_C_SOURCE=200809L"
3.
cmake .. -DBUILD_SHARED_LIBS=ON -DGGML_OPENCL=ON -DGGML_OPENCL_EMBED_KERNELS=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON
4.
cmake --build build-android
cmake --build . --config Release
It worked fine and built successfully but got lower performance than CPU.
GPU bench:
~/.../build-android/bin $ ./llama-bench -m /data/local/tmp/llama.cpp/SmolVLM2-500M-Video-Instruct-Q8_0.gguf
ggml_opencl: selected platform: 'QUALCOMM Snapdragon(TM)'
ggml_opencl: device: 'QUALCOMM Adreno(TM) 830 (OpenCL 3.0 Adreno(TM) 830)'
ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: commit unknown Compiler E031.47.14.01
ggml_opencl: vector subgroup broadcast support: true
ggml_opencl: device FP16 support: true
ggml_opencl: mem base addr align: 128
ggml_opencl: max mem alloc size: 1024 MB
ggml_opencl: SVM coarse grain buffer support: true
ggml_opencl: SVM fine grain buffer support: true
ggml_opencl: SVM fine grain system support: false
ggml_opencl: SVM atomics support: true
ggml_opencl: flattening quantized weights representation as struct of arrays (GGML_OPENCL_SOA_Q)
ggml_opencl: using kernels optimized for Adreno (GGML_OPENCL_USE_ADRENO_KERNELS)
ggml_opencl: loading OpenCL kernels............................................
ggml_opencl: default device: 'QUALCOMM Adreno(TM) 830 (OpenCL 3.0 Adreno(TM) 830)'
model | size | params | backend | ngl | test | t/s |
---|---|---|---|---|---|---|
llama 8B Q8_0 | 414.86 MiB | 409.25 M | OpenCL | 99 | pp512 | 115.82 ± 2.96 |
llama 8B Q8_0 | 414.86 MiB | 409.25 M | OpenCL | 99 | tg128 | 14.31 ± 0.19 |
build: 53ae306 (5528)
CPU bench:
~/.../build-android-cpu/bin $ ./llama-bench -m /data/local/tmp/llama.cpp/SmolVLM2-500M-Video-Instruct-Q8_0.gguf
model | size | params | backend | threads | test | t/s |
---|---|---|---|---|---|---|
llama 8B Q8_0 | 414.86 MiB | 409.25 M | CPU | 8 | pp512 | 404.00 ± 3.75 |
llama 8B Q8_0 | 414.86 MiB | 409.25 M | CPU | 8 | tg128 | 109.77 ± 0.44 |
build: 53ae306 (5528)
I really confused on it. Is it due to any errors in my compilation process or is it not optimized properly?
openCL version? or android ndk version?
I am a newcomer, thank you.