8000 android built on GPU cannot comparable with CPU? · Issue #13910 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content
android built on GPU cannot comparable with CPU? #13910
Open
@YoiteShao

Description

@YoiteShao

I tried to build on Android device with GPU env but fail at official documents.
1.Termux env
2.openCL

I blocked here:

Image

Image

So, I changed to another build method as below:
1.
using termux default cmake tool does not ninja
2.
cmake .. -DCMAKE_BUILD_TYPE=Release
-DOPENCL_ICD_LOADER_HEADERS_DIR=/data/data/com.termux/files/usr/include
-DCMAKE_C_COMPILER=/data/data/com.termux/files/usr/bin/clang
-DCMAKE_CXX_COMPILER=/data/data/com.termux/files/usr/bin/clang++
-DCMAKE_C_FLAGS="--target=aarch64-linux-android24 -D_POSIX_C_SOURCE=200809L"
-DCMAKE_CXX_FLAGS="--target=aarch64-linux-android24 -D_POSIX_C_SOURCE=200809L"
3.
cmake .. -DBUILD_SHARED_LIBS=ON -DGGML_OPENCL=ON -DGGML_OPENCL_EMBED_KERNELS=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON
4.
cmake --build build-android
cmake --build . --config Release

It worked fine and built successfully but got lower performance than CPU.
GPU bench:
~/.../build-android/bin $ ./llama-bench -m /data/local/tmp/llama.cpp/SmolVLM2-500M-Video-Instruct-Q8_0.gguf
ggml_opencl: selected platform: 'QUALCOMM Snapdragon(TM)'

ggml_opencl: device: 'QUALCOMM Adreno(TM) 830 (OpenCL 3.0 Adreno(TM) 830)'
ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: commit unknown Compiler E031.47.14.01
ggml_opencl: vector subgroup broadcast support: true
ggml_opencl: device FP16 support: true
ggml_opencl: mem base addr align: 128
ggml_opencl: max mem alloc size: 1024 MB
ggml_opencl: SVM coarse grain buffer support: true
ggml_opencl: SVM fine grain buffer support: true
ggml_opencl: SVM fine grain system support: false
ggml_opencl: SVM atomics support: true
ggml_opencl: flattening quantized weights representation as struct of arrays (GGML_OPENCL_SOA_Q)
ggml_opencl: using kernels optimized for Adreno (GGML_OPENCL_USE_ADRENO_KERNELS)
ggml_opencl: loading OpenCL kernels............................................
ggml_opencl: default device: 'QUALCOMM Adreno(TM) 830 (OpenCL 3.0 Adreno(TM) 830)'

model size params backend ngl test t/s
llama 8B Q8_0 414.86 MiB 409.25 M OpenCL 99 pp512 115.82 ± 2.96
llama 8B Q8_0 414.86 MiB 409.25 M OpenCL 99 tg128 14.31 ± 0.19

build: 53ae306 (5528)

CPU bench:
~/.../build-android-cpu/bin $ ./llama-bench -m /data/local/tmp/llama.cpp/SmolVLM2-500M-Video-Instruct-Q8_0.gguf

model size params backend threads test t/s
llama 8B Q8_0 414.86 MiB 409.25 M CPU 8 pp512 404.00 ± 3.75
llama 8B Q8_0 414.86 MiB 409.25 M CPU 8 tg128 109.77 ± 0.44

build: 53ae306 (5528)

I really confused on it. Is it due to any errors in my compilation process or is it not optimized properly?
openCL version? or android ndk version?
I am a newcomer, thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0