llama-server not using GPU #1826

RakshitAralimatti · 2024-11-09T04:23:37Z

After I install llama-cpp-python-server with cuda support and run
python3 -m llama_cpp.server --model starcoderbase-3b/starcoderbase-3b.Q4_K_M.gguf --n_gpu_layers 10
The GPU is not getting used its running on the CPU

The text was updated successfully, but these errors were encountered:

pepijndevos · 2025-01-08T06:13:35Z

I'm seeing the same problem with Vulkan.

I see all layers of the model actually getting loaded on the GPU, and nvtop shows significant memory use.
But then htop shows it's using 100% CPU and only a small blip of GPU.

llama.cpp itself works fine on the same hardware.

pepijndevos · 2025-01-08T09:36:33Z

Wait maybe it's actually using the GPU but just insanely bottlenecked on Python CPU performance?

chozillla · 2025-05-21T08:35:55Z

I still have this issue! The GPU is being used when I use llama-cli

pepijndevos · 2025-05-21T09:19:17Z

Can you post logs? In my case the GPU was being detected and used but bottlenecked by the CPU because the python part is slow

I ended up building https://github.com/sanctuary-systems-com/llama_multiserver

chozillla · 2025-05-21T10:10:03Z

Nvm I used the wrong argument, make sure that ngl is not negative! ngl 99999 not ngl -99999 since ngl -1 in theory should use all cores.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama-server not using GPU #1826

llama-server not using GPU #1826

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

llama-server not using GPU #1826

llama-server not using GPU #1826

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!