Llava clip not loading to GPU in version 0.2.58 (Downgrading to 0.2.55 works)

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/abetlen/llama-cpp-python/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

I wanted to implement Llava as depicted in the readme I used the provided code and the linked GGUF files. I also installed the module using the cublast flags as mentioned in the documentation.

I expected the clip vision tower to be loaded in cuda and the llm to be loaded in cuda

# Current Behavior
On Latest version 0.2.58 of llama-cpp-python. I observe that the clip model forces CPU backend, while the llm part uses CUDA. Downgrading llama-cpp-python to version 0.2.55 fixes this issue.

# Environment and Context

OS: Ubuntu 22.04 - X86
CUDA: 11.8
Python: 3.8 (in miniconda)
llama-cpp-python: 0.2.58



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions