8000 GitHub - fazil1305/llama-cpp-python: Python bindings for llama.cpp
[go: up one dir, main page]

Skip to content

fazil1305/llama-cpp-python

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🦙 Python Bindings for llama.cpp

Documentation Status Tests PyPI PyPI - Python Version PyPI - License PyPI - Downloads

Simple Python bindings for @ggerganov's llama.cpp library. This package provides:

  • Low-level access to C API via ctypes interface.
  • High-level Python API for text completion
    • OpenAI-like API
    • LangChain compatibility

Documentation is available at https://llama-cpp-python.readthedocs.io/en/latest.

Warning

Starting with version 0.1.79 the model format has changed from ggmlv3 to gguf. Old model files can be converted using the convert-llama-ggmlv3-to-gguf.py script in llama.cpp

Installation from PyPI

Install from PyPI (requires a c compiler):

pip install llama-cpp-python

The above command will attempt to install the package and build llama.cpp from source. This is the recommended installation method as it ensures that llama.cpp is built with the available optimizations for your system.

If you have previously installed llama-cpp-python through pip and want to upgrade your version or rebuild the package with different compiler options, please add the following flags to ensure that the package is rebuilt correctly:

pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

Note: If you are using Apple Silicon (M1) Mac, make sure you have installed a version of Python that supports arm64 architecture. For example:

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64.sh

Otherwise, while installing it will build the llama.ccp x86 version which will be 10x slower on Apple Silicon (M1) Mac.

Installation with Hardware Acceleration

llama.cpp supports multiple BLAS backends for faster processing.

To install with OpenBLAS, set the LLAMA_BLAS and LLAMA_BLAS_VENDOR environment variables before installing:

CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python

To install with cuBLAS, set the LLAMA_CUBLAS=1 environment variable before installing:

CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

To install with CLBlast, set the LLAMA_CLBLAST=1 environment variable before installing:

CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python

To install with Metal (MPS), set the LLAMA_METAL=on environment variable before installing:

CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python

To install with hipBLAS / ROCm support for AMD cards, set the LLAMA_HIPBLAS=on environment variable before installing:

CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python

Windows remarks

To set the variables CMAKE_ARGSin PowerShell, follow the next steps (Example using, OpenBLAS):

$env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on"

Then, call pip after setting the variables:

pip install llama-cpp-python

See the above instructions and set CMAKE_ARGS to the BLAS backend you want to use.

MacOS remarks

Detailed MacOS Metal GPU install documentation is available at docs/install/macos.md

High-level API