🦙 Python Bindings for `llama.cpp`

Simple Python bindings for @ggerganov's llama.cpp library. This package provides:

Low-level access to C API via ctypes interface.
High-level Python API for text completion
- OpenAI-like API
- LangChain compatibility
- LlamaIndex compatibility
OpenAI compatible web server

Documentation is available at https://llama-cpp-python.readthedocs.io/en/latest.

Installation

Requirements:

Python 3.8+
C compiler
- Linux: gcc or clang
- Windows: Visual Studio or MinGW
- MacOS: Xcode

To install the package, run:

pip install llama-cpp-python

This will also build llama.cpp from source and install it alongside this python package.

If this fails, add --verbose to the pip install see the full cmake build log.

Pre-built Wheel (New)

It is also possible to install a pre-built wheel with basic CPU support.

pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu

Installation Configuration

llama.cpp supports a number of hardware acceleration backends to speed up inference as well as backend specific options. See the llama.cpp README for a full list.

All llama.cpp cmake build options can be set via the CMAKE_ARGS environment variable or via the --config-settings / -C cli flag during installation.

Environment Variables

# Linux and Mac
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" \
  pip install llama-cpp-python

# Windows
$env:CMAKE_ARGS = "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
pip install llama-cpp-python

CLI / requirements.txt

They can also be set via pip install -C / --config-settings command and saved to a requirements.txt file:

pip install --upgrade pip # ensure pip is up to date
pip install llama-cpp-python \
  -C cmake.args="-DLLAMA_BLAS=ON;-DLLAMA_BLAS_VENDOR=OpenBLAS"

# requirements.txt

llama-cpp-python -C cmake.args="-DLLAMA_BLAS=ON;-DLLAMA_BLAS_VENDOR=OpenBLAS"

Supported Backends

Below are some common backends, their build commands and any additional environment variables required.

OpenBLAS (CPU)

To install with OpenBLAS, set the LLAMA_BLAS and LLAMA_BLAS_VENDOR environment variables before installing:

CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python

CUDA

To install with CUDA support, set the LLAMA_CUDA=on environment variable before installing:

CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python

Pre-built Wheel (New)

It is also possible to install a pre-built wheel with CUDA support. As long as your system meets some requirements:

CUDA Version is 12.1, 12.2 or 12.3
Python Version is 3.10, 3.11 or 3.12

pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/<cuda-version>

Where <cuda-version> is one of the following:

cu121: CUDA 12.1
cu122: CUDA 12.2
cu123: CUDA 12.3

For example, to install the CUDA 12.1 wheel:

pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

Metal

To install with Metal (MPS), set the LLAMA_METAL=on environment variable before installing:

CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python

Pre-built Wheel (New)

It is also possible to install a pre-built wheel with Metal support. As long as your system meets some requirements:

MacOS Version is 11.0 or later
Python Version is 3.10, 3.11 or 3.12

pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal

CLBlast (OpenCL)

To install with CLBlast, set the LLAMA_CLBLAST=on environment variable before installing:

CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python

hipBLAS (ROCm)

To install with hipBLAS / ROCm support for AMD cards, set the LLAMA_HIPBLAS=on environment variable before installing:

CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python

Vulkan

To install with Vulkan support, set the LLAMA_VULKAN=on environment variable before installing:

CMAKE_ARGS="-DLLAMA_VULKAN=on" pip install llama-cpp-python

Kompute

To install with Kompute support, set the LLAMA_KOMPUTE=on environment variable before installing:

CMAKE_ARGS="-DLLAMA_KOMPUTE=on" pip install llama-cpp-python

SYCL

To install with SYCL support, set the LLAMA_SYCL=on environment variable before installing:

source /opt/intel/oneapi/setvars.sh   
CMAKE_ARGS="-DLLAMA_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install llama-cpp-python

Windows Notes

Error: Can't find 'nmake' or 'CMAKE_C_COMPILER'

If you run into issues where it complains it can't find 'nmake' '?' or CMAKE_C_COMPILER, you can extract w64devkit as mentioned in llama.cpp repo and add those manually to CMAKE_ARGS before running pip install:

$env:CMAKE_GENERATOR = "MinGW Makefiles"
$env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on -DCMAKE_C_COMPILER=C:/w64devkit/bin/gcc.exe -DCMAKE_CXX_COMPILER=C:/w64devkit/bin/g++.exe"

See the above instructions and set CMAKE_ARGS to the BLAS backend you want to use.

MacOS Notes

Detailed MacOS Metal GPU install documentation is available at docs/install/macos.md

M1 Mac Performance Issue

Note: If you are using Apple Silicon (M1) Mac, make sure you have installed a version of Python that supports arm64 architecture. For example:

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64.sh

Otherwise, while installing it will build the llama.cpp x86 version which will be 10x slower on Apple Silicon (M1) Mac.

M Series Mac Error: `(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))`

Try installing with

CMAKE_ARGS="-DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 -DLLAMA_METAL=on" pip install --upgrade --verbose --force-reinstall --no-cache-dir llama-cpp-python

< 8000 div class="markdown-heading" dir="auto">

Name		Name	Last commit message	Last commit date
Latest commit History 1,676 Commits
.github		.github
docker		docker
docs		docs
examples		examples
llama_cpp		llama_cpp
scripts		scripts
tests		tests
vendor		vendor
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🦙 Python Bindings for `llama.cpp`

Installation

Installation Configuration

Supported Backends

Windows Notes

MacOS Notes

Upgrading and Reinstalling

High-level API

Pulling models from Hugging Face Hub

Chat Completion

JSON and JSON Schema Mode

JSON Mode

JSON Schema Mode

Function Calling

Multi-modal Models

Speculative Decoding

Embeddings

Adjusting the Context Window

OpenAI Compatible Web Server

Web Server Features

Docker image

Low-level API

Documentation

Development

FAQ

Are there pre-built binaries / binary wheels available?

How does this compare to other Python bindings of `llama.cpp`?

License

About

Uh oh!

Releases

Packages

Languages

License

viet-dinh/llama-cpp-python

Folders and files

Latest commit

History

Repository files navigation

🦙 Python Bindings for llama.cpp

Installation

Installation Configuration

Supported Backends

Windows Notes

MacOS Notes

Upgrading and Reinstalling

High-level API

Pulling models from Hugging Face Hub

Chat Completion

JSON and JSON Schema Mode

JSON Mode

JSON Schema Mode

Function Calling

Multi-modal Models

Speculative Decoding

Embeddings

Adjusting the Context Window

OpenAI Compatible Web Server

Web Server Features

Docker image

Low-level API

Documentation

Development

FAQ

Are there pre-built binaries / binary wheels available?

How does this compare to other Python bindings of llama.cpp?

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

🦙 Python Bindings for `llama.cpp`

How does this compare to other Python bindings of `llama.cpp`?

Packages