10000 Llava clip not loading to GPU in version 0.2.58 (Downgrading to 0.2.55 works) · Issue #1324 · abetlen/llama-cpp-python · GitHub
[go: up one dir, main page]

Skip to content
Llava clip not loading to GPU in version 0.2.58 (Downgrading to 0.2.55 works) #1324
Closed
@FYYHU

Description

@FYYHU

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

I wanted to implement Llava as depicted in the readme I used the provided code and the linked GGUF files. I also installed the module using the cublast flags as mentioned in the documentation.

I expected the clip vision tower to be loaded in cuda and the llm to be loaded in cuda

Current Behavior

On Latest version 0.2.58 of llama-cpp-python. I observe that the clip model forces CPU backend, while the llm part uses CUDA. Downgrading llama-cpp-python to version 0.2.55 fixes this issue.

Environment and Context

OS: Ubuntu 22.04 - X86
CUDA: 11.8
Python: 3.8 (in miniconda)
llama-cpp-python: 0.2.58

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0