Closed
Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
I wanted to implement Llava as depicted in the readme I used the provided code and the linked GGUF files. I also installed the module using the cublast flags as mentioned in the documentation.
I expected the clip vision tower to be loaded in cuda and the llm to be loaded in cuda
Current Behavior
On Latest version 0.2.58 of llama-cpp-python. I observe that the clip model forces CPU backend, while the llm part uses CUDA. Downgrading llama-cpp-python to version 0.2.55 fixes this issue.
Environment and Context
OS: Ubuntu 22.04 - X86
CUDA: 11.8
Python: 3.8 (in miniconda)
llama-cpp-python: 0.2.58