Eval bug: mtmd in server mode crashes on too big image

Name and Version

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3080, compute capability 8.6, VMM: yes
version: 5331 (33eff40)
built with cc (Ubuntu 14.2.0-4ubuntu2) 14.2.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

i7-9700K + GTX 3080 10GB VRAM

Models

Qwen2.5 VL (Q4_K_M)

Problem description & steps to reproduce

Upload a big image (i.e. 3MB for my case) without --no-mmproj-offload. The server crashes on an OOM.

First Bad Commit

33eff40

Relevant log output

srv  process_chun: processing image...
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 4514.05 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 4733328896
/devel/tools/llama.cpp/ggml/src/ggml-backend.cpp:1662: GGML_ASSERT((char *)addr + ggml_backend_buffer_get_alloc_size(buffer, tensor) <= (char *)ggml_backend_buffer_get_base(buffer) + ggml_backend_buffer_get_size(buffer)) failed
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions