Closed
Description
Name and Version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3080, compute capability 8.6, VMM: yes
version: 5331 (33eff40)
built with cc (Ubuntu 14.2.0-4ubuntu2) 14.2.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
i7-9700K + GTX 3080 10GB VRAM
Models
Qwen2.5 VL (Q4_K_M)
Problem description & steps to reproduce
Upload a big image (i.e. 3MB for my case) without --no-mmproj-offload
. The server crashes on an OOM.
First Bad Commit
Relevant log output
srv process_chun: processing image...
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 4514.05 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 4733328896
/devel/tools/llama.cpp/ggml/src/ggml-backend.cpp:1662: GGML_ASSERT((char *)addr + ggml_backend_buffer_get_alloc_size(buffer, tensor) <= (char *)ggml_backend_buffer_get_base(buffer) + ggml_backend_buffer_get_size(buffer)) failed
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)