Support llama.cpp "Multi GPU support, CUDA refactor, CUDA scratch buffer" · Issue #344 · abetlen/llama-cpp-python · GitHub

8000 Support llama.cpp "Multi GPU support, CUDA refactor, CUDA scratch buffer" · Issue #344 · abetlen/llama-cpp-python · GitHub

Support llama.cpp "Multi GPU support, CUDA refactor, CUDA scratch buffer" #344

Closed

Closed

Support llama.cpp "Multi GPU support, CUDA refactor, CUDA scratch buffer"#344

Labels

enhancementllama.cpp

Multi-GPU inference is essential for small VRAM GPU. 13B llama model cannot fit in a single 3090 unless using quantization.

llama.cpp yesterday merge multi gpu branch, which help us using small VRAM GPUS to deploy LLM.
ggml-org/llama.cpp#1703

Hope llama-cpp-python can support multi GPU inference in the future.
Many thanks!!!

Metadata

Assignees

No one assigned

Labels

enhancementllama.cpp

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

0