8000 Eval bug: A100 GPU not working with CUDA 12.8 in llama.cpp · Issue #13609 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content

Eval bug: A100 GPU not working with CUDA 12.8 in llama.cpp #13609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
vladrad opened this issue May 17, 2025 · 5 comments
Open

Eval bug: A100 GPU not working with CUDA 12.8 in llama.cpp #13609

vladrad opened this issue May 17, 2025 · 5 comments

Comments

@vladrad
Copy link
vladrad commented May 17, 2025

Name and Version

I'm encountering an issue with llama.cpp after upgrading to CUDA 12.8. My setup includes an RTX A6000 Pro (Blackwell) and an A100 80GB GPU. The A6000 works fine when using CUDA_VISIBLE_DEVICES=0. However, when switching to CUDA_VISIBLE_DEVICES=1 for the A100, the application crashes.

I went back and compiled for all arches and then tried 80 and 120. The Blackwell works but the A100 (even though shows up ok via nvidia-smi) seems to crash at start. Even listing llama cli --help shows this error and it always defaults to cpu.

Version:

ggml_cuda_init: failed to initialize CUDA: initialization error
version: 5400 (c6a2c9e7)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
Cuda 12.8
NVIDIA-SMI 570.133.20             Driver Version: 570.133.20  
Ubuntu 22.04

./lama.cpp/build/bin/llama-cli -hf lmstudio-community/Qwen3-4B-GGUF --n-gpu-layers 99 --ctx-size 32000
ggml_cuda_init: failed to initialize CUDA: initialization error
warning: no usable GPU found, --gpu-layers option will be ignored
warning: one possible reason is that llama.cpp was compiled without GPU support
warning: consult docs/build.md for compilation instructions

Operating systems

Linux

GGML backends

CUDA

Hardware

RTX A6000 pro Blackwell
A100 80gb

Models

Any model

Problem description & steps to reproduce

Compile with 12.8 and use a A100. It has trouble initializing.

First Bad Commit

No response

Relevant log output

Cuda 12.8
NVIDIA-SMI 570.133.20             Driver Version: 570.133.20  
Ubuntu 22.04

./lama.cpp/build/bin/llama-cli -hf lmstudio-community/Qwen3-4B-GGUF --n-gpu-layers 99 --ctx-size 32000
ggml_cuda_init: failed to initialize CUDA: initialization error
warning: no usable GPU found, --gpu-layers option will be ignored
warning: one possible reason is that llama.cpp was compiled without GPU support
warning: consult docs/build.md for compilation instructions
@JohannesGaessler
Copy link
Collaborator

Does it work when you compile with GGML_NATIVE=OFF?

@vladrad
Copy link
Author
vladrad commented May 17, 2025

Thank you for the response I tried a bunch of different things... I wonder if this a multi gpu bug or driver issues.
If I do cuda visible devices to 0 ... it works. if I do 1 it fails... however

sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm

seems to now enable 1 but 0 does not work. anyone else see anything like that?

@vladrad
Copy link
Author
vladrad commented May 18, 2025

hmmm I can't tell if this is a cuda bug or not.

I wanted to see if something else could work. I tried python transformers library with nightly PyTorch. It also had the same behavior. Works when cuda visible devices are operated but seems to have issues when both gpus are called. in PyTorch it fails on the get gpus call

I am assuming this is a driver/cuda issue?

@JohannesGaessler
Copy link
Collaborator

Does it work with GGML_NATIVE=OFF or not? What I think is happening is that during the compilation the 5090 is autodetected, code is being compiled for real architecture 12.0 and then the program crashes when trying to run the code on then A100 with compute capability 8.0.

@vladrad
Copy link
Author
vladrad commented May 18, 2025

hey @JohannesGaessler sorry for the late response.

it did not work with GGML_NATIVE=off . I compiled and still got the same issues. It seems like they both can't be used at the same time with the new driver?

I even tried other libraries besides llama cpp and they all seem to have this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants
0