8000 Support llama.cpp "Multi GPU support, CUDA refactor, CUDA scratch buffer" · Issue #344 · abetlen/llama-cpp-python · GitHub
[go: up one dir, main page]

Skip to content
Support llama.cpp "Multi GPU support, CUDA refactor, CUDA scratch buffer" #344
Closed
@wyhanz

Description

@wyhanz

Multi-GPU inference is essential for small VRAM GPU. 13B llama model cannot fit in a single 3090 unless using quantization.

llama.cpp yesterday merge multi gpu branch, which help us using small VRAM GPUS to deploy LLM.
ggml-org/llama.cpp#1703

Hope llama-cpp-python can support multi GPU inference in the future.
Many thanks!!!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestllama.cppProblem with llama.cpp shared lib

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0