android built on GPU cannot comparable with CPU?

I tried to build on Android device with GPU env but fail at official documents.
1.Termux env
2.openCL 

I blocked here:

![Image](https://github.com/user-attachments/assets/918e9903-f500-41ba-ae96-33ee73818009)

![Image](https://github.com/user-attachments/assets/8673bcf2-8ac2-4a37-a02e-d684d6759cfe)

So, I changed to another build method as below:
1.
using termux default cmake tool does not ninja
2.
cmake .. -DCMAKE_BUILD_TYPE=Release \
  -DOPENCL_ICD_LOADER_HEADERS_DIR=/data/data/com.termux/files/usr/include \
  -DCMAKE_C_COMPILER=/data/data/com.termux/files/usr/bin/clang \
  -DCMAKE_CXX_COMPILER=/data/data/com.termux/files/usr/bin/clang++ \
  -DCMAKE_C_FLAGS="--target=aarch64-linux-android24 -D_POSIX_C_SOURCE=200809L" \
  -DCMAKE_CXX_FLAGS="--target=aarch64-linux-android24 -D_POSIX_C_SOURCE=200809L"
3.
cmake .. -DBUILD_SHARED_LIBS=ON -DGGML_OPENCL=ON -DGGML_OPENCL_EMBED_KERNELS=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON
4.
cmake --build build-android 
cmake --build . --config Release


It worked fine and built successfully but got lower performance than CPU.
**GPU bench:**
~/.../build-android/bin $ ./llama-bench -m /data/local/tmp/llama.cpp/SmolVLM2-500M-Video-Instruct-Q8_0.gguf
ggml_opencl: selected platform: 'QUALCOMM Snapdragon(TM)'

ggml_opencl: device: 'QUALCOMM Adreno(TM) 830 (OpenCL 3.0 Adreno(TM) 830)'
ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: commit unknown Compiler E031.47.14.01
ggml_opencl: vector subgroup broadcast support: true
ggml_opencl: device FP16 support: true
ggml_opencl: mem base addr align: 128
ggml_opencl: max mem alloc size: 1024 MB
ggml_opencl: SVM coarse grain buffer support: true
ggml_opencl: SVM fine grain buffer support: true
ggml_opencl: SVM fine grain system support: false
ggml_opencl: SVM atomics support: true
ggml_opencl: flattening quantized weights representation as struct of arrays (GGML_OPENCL_SOA_Q)
ggml_opencl: using kernels optimized for Adreno (GGML_OPENCL_USE_ADRENO_KERNELS)
ggml_opencl: loading OpenCL kernels............................................
ggml_opencl: default device: 'QUALCOMM Adreno(TM) 830 (OpenCL 3.0 Adreno(TM) 830)'
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| llama 8B Q8_0                  | 414.86 MiB |   409.25 M | OpenCL     |  99 |           pp512 |        115.82 ± 2.96 |
| llama 8B Q8_0                  | 414.86 MiB |   409.25 M | OpenCL     |  99 |           tg128 |         14.31 ± 0.19 |

build: 53ae3064 (5528)


**CPU bench:**
~/.../build-android-cpu/bin $ ./llama-bench -m /data/local/tmp/llama.cpp/SmolVLM2-500M-Video-Instruct-Q8_0.gguf
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| llama 8B Q8_0                  | 414.86 MiB |   409.25 M | CPU        |       8 |           pp512 |        404.00 ± 3.75 |
| llama 8B Q8_0                  | 414.86 MiB |   409.25 M | CPU        |       8 |           tg128 |        109.77 ± 0.44 |

build: 53ae3064 (5528)


I really confused on it. Is it due to any errors in my compilation process or is it not optimized properly?
openCL version? or android ndk version? 
I am a newcomer, thank you.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

model	size	params	backend	ngl	test	t/s
llama 8B Q8_0	414.86 MiB	409.25 M	OpenCL	99	pp512	115.82 ± 2.96
llama 8B Q8_0	414.86 MiB	409.25 M	OpenCL	99	tg128	14.31 ± 0.19

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions