Python Bindings for llama.cpp
Simple Python bindings for @ggerganov's llama.cpp
library.
This package provides:
- Low-level access to C API via
ctypes
interface. - High-level Python API for text completion
- OpenAI-like API
- LangChain compatibility
- LlamaIndex compatibility
- OpenAI compatible web server
Documentation is available at https://llama-cpp-python.readthedocs.io/en/latest.
Requirements:
- Python 3.8+
- C compiler
- Linux: gcc or clang
- Windows: Visual Studio or MinGW
- MacOS: Xcode
To install the package, run:
pip install llama-cpp-python
This will also build llama.cpp
from source and install it alongside this python package.
If this fails, add --verbose
to the pip install
see the full cmake build log.
Pre-built Wheel (New)
It is also possible to install a pre-built wheel with basic CPU support.
pip install llama-cpp-python \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
llama.cpp
supports a number of hardware acceleration backends to speed up inference as well as backend specific options. See the llama.cpp README for a full list.
All llama.cpp
cmake build options can be set via the CMAKE_ARGS
environment variable or via the --config-settings / -C
cli flag during installation.
Environment Variables
# Linux and Mac
CMAKE_ARGS="-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS" \
pip install llama-cpp-python
# Windows
$env:CMAKE_ARGS = "-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS"
pip install llama-cpp-python
CLI / requirements.txt
They can also be set via pip install -C / --config-settings
command and saved to a requirements.txt
file:
pip install --upgrade pip # ensure pip is up to date
pip install llama-cpp-python \
-C cmake.args="-DGGML_BLAS=ON;-DGGML_BLAS_VENDOR=OpenBLAS"
# requirements.txt
llama-cpp-python -C cmake.args="-DGGML_BLAS=ON;-DGGML_BLAS_VENDOR=OpenBLAS"
Below are some common backends, their build commands and any additional environment variables required.
OpenBLAS (CPU)
To install with OpenBLAS, set the GGML_BLAS
and GGML_BLAS_VENDOR
environment variables before installing:
CMAKE_ARGS="-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
CUDA
To install with CUDA support, set the GGML_CUDA=on
environment variable before installing:
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python
Pre-built Wheel (New)
It is also possible to install a pre-built wheel with CUDA support. As long as your system meets some requirements:
- CUDA Version is 12.1, 12.2, 12.3, 12.4 or 12.5
- Python Version is 3.10, 3.11 or 3.12
pip install llama-cpp-python \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/<cuda-version>
Where <cuda-version>
is one of the following:
cu121
: CUDA 12.1cu122
: CUDA 12.2cu123
: CUDA 12.3cu124
: CUDA 12.4cu125
: CUDA 12.5
For example, to install the CUDA 12.1 wheel:
pip install llama-cpp-python \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
Metal
To install with Metal (MPS), set the GGML_METAL=on
environment variable before installing:
CMAKE_ARGS="-DGGML_METAL=on" pip install llama-cpp-python
Pre-built Wheel (New)
It is also possible to install a pre-built wheel with Metal support. As long as your system meets some requirements:
- MacOS Version is 11.0 or later
- Python Version is 3.10, 3.11 or 3.12
pip install llama-cpp-python \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal
hipBLAS (ROCm)
To install with hipBLAS / ROCm support for AMD cards, set the GGML_HIPBLAS=on
environment variable before installing:
CMAKE_ARGS="-DGGML_HIPBLAS=on" pip install llama-cpp-python
Vulkan
To install with Vulkan support, set the GGML_VULKAN=on
environment variable before installing:
CMAKE_ARGS="-DGGML_VULKAN=on" pip install llama-cpp-python
SYCL
To install with SYCL support, set the GGML_SYCL=on
environment variable before installing:
source /opt/intel/oneapi/setvars.sh
CMAKE_ARGS="-DGGML_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install llama-cpp-python
RPC
To install with RPC support, set the GGML_RPC=on
environment variable before installing:
source /opt/intel/oneapi/setvars.sh
CMAKE_ARGS="-DGGML_RPC=on" pip install llama-cpp-python
Error: Can't find 'nmake' or 'CMAKE_C_COMPILER'
If you run into issues where it complains it can't find 'nmake'
'?'
or CMAKE_C_COMPILER, you can extract w64devkit as mentioned in llama.cpp repo and add those manually to CMAKE_ARGS before running pip
install:
$env:CMAKE_GENERATOR = "MinGW Makefiles"
$env:CMAKE_ARGS = "-DGGML_OPENBLAS=on -DCMAKE_C_COMPILER=C:/w64devkit/bin/gcc.exe -DCMAKE_CXX_COMPILER=C:/w64devkit/bin/g++.exe"