A comprehensive, step-by-step guide for successfully installing and running llama-cpp-python with CUDA GPU acceleration on Windows. This repository provides a definitive solution to the common installation challenges, including exact version requirements, environment setup, and troubleshooting tips.
Since I couldn't find a comprehensive guide or a reliable solution to get llama-cpp-python
running smoothly with CUDA on Windows, here's my consolidated approach. This method has successfully helped two of my friends overcome various installation issues by focusing on a clean, local environment and specific package versions.
- Local, Deletable Conda Environment: Environment created directly in the project folder for easy cleanup (using
conda create --prefix ./env python=3.11
). - Specific CUDA Toolkit Installation via Conda: Uses
conda install nvidia/label/cuda-12.1.0::cuda-toolkit
. - Precise PyTorch Installation for CUDA 12.1: Uses
pip3 install torch --index-url https://download.pytorch.org/whl/cu121
. - Precise Visual Studio 2019 Configuration: Exact specification of required components with a direct download link.
- Specific Download Links: For Visual Studio 2019, (system) CUDA Toolkit 12.1.0, and CMake 3.31.7.
- Detailed Environment Variable Setup: For the system-level CUDA, which aids the C++ compiler.
- Troubleshooting: Tips for cleaning up after failed attempts.
- Correct
llama-cpp-python
Installation: Including necessary build arguments for CUDA.
If you've had previous, unsuccessful installation attempts of llama-cpp-python
or similar packages, it's advisable to remove potential remnants:
- Uninstall Visual Studio 2022: If VS 2022 is installed and causing problems, uninstall it via "Apps & Features" in Windows Settings. This guide focuses on VS 2019.
- Delete Temporary Files and Caches:
- Close all terminals and development environments.
- Open Windows Explorer and type
%TEMP%
into the address bar. Delete the contents of this folder (some files might be locked, which is okay). - Type
%APPDATA%
. Look for folders related topip
orcmake
(e.g.,pip/cache
) and delete their contents or the folders themselves if you're sure. Be cautious here. - If necessary, delete old, faulty Conda environments (especially if they were not created with
--prefix
inside a project folder).
Even though we install a CUDA toolkit into Conda, having a system-level CUDA Toolkit (especially for the C++ compiler and nvcc
to be found easily by CMake/pip during build) and correct Visual Studio setup is crucial.
Check your NVIDIA drivers and any system-installed CUDA Toolkits.
- Driver & Supported CUDA Version (
nvidia-smi
): Open PowerShell and enter:Note the "CUDA Version" in the top right. This is the maximum version supported by your current driver.nvidia-smi
- Installed System CUDA Toolkits (
nvcc --version
): If you already have a system-wide CUDA Toolkit installed:This shows the version of the toolkit currently found in the system path. This guide assumes you'll have CUDA 12.1 system-wide for the compiler.nvcc --version
llama-cpp-python
requires a C++ compiler. Visual Studio 2019 is recommended.
-
Download: Download Visual Studio 2019 Community Edition directly from archive.org:
-
Installation – Workloads & Components:
- Select the workload "Desktop development with C++".
- Go to "Individual components" and ensure these are selected:
MSVC v142 - VS 2019 C++ x64/x86 build tools (Latest)
Windows 10 SDK (e.g., 10.0.19041.0)
Windows 11 SDK (e.g., 10.0.22000.0 or newer)
– Crucial!C++ CMake tools for Windows
Example of selected components:
This screenshot shows the required components you need to select during installation.
- Download CMake Version 3.31.7:
- During installation, enable "Add CMake to the system PATH for all users".
This system-level installation helps the C++ build tools find nvcc
.
- Download CUDA Toolkit 12.1.0:
- Download CUDA Toolkit 12.1.0 from NVIDIA (select Windows, x86_64, version 10/11,
exe (local)
).
- Download CUDA Toolkit 12.1.0 from NVIDIA (select Windows, x86_64, version 10/11,
- Installation: Choose "Custom (Advanced)", select "CUDA" (including Visual Studio Integration). Note the path:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
.
This is critical for nvcc
to be found by the build process.
- Open Environment Variables: Search "Edit the system environment variables".
- Set/Check
CUDA_PATH
andCUDA_PATH_v_1_x_xx
(System Variable):- Name:
CUDA_PATH
- Value:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
- Name:
- Edit
Path
Variable (System Variables):- Ensure these are at the top:
%CUDA_PATH%\bin
%CUDA_PATH%\libnvvp
- Ensure these are at the top:
- Apply Changes and Restart PowerShell: Close ALL PowerShells and open a new one.
- Verification in new PowerShell:
echo $env:CUDA_PATH nvcc --version # Should show 12.1
-
Create Project Folder:
mkdir D:\AI\LlamaCPPProject cd D:\AI\LlamaCPPProject
-
Create Local Conda Environment (named
env
inside project): This is key for easy deletion if things go wrong.# Important: Use the x64 version of PowerShell conda create --prefix ./env python=3.11
-
Activate Conda Environment:
conda activate ./env # Or from the project root: conda activate .\env
Your PowerShell prompt should change to show
(./env)
.
Now, inside the activated env
environment:
-
Install CUDA Toolkit 12.1.0 via Conda: This provides the CUDA runtime libraries specifically for this environment.
# Ensure (./env) is active conda install nvidia/label/cuda-12.1.0::cuda-toolkit -c nvidia/label/cuda-12.1.0
Note: The channel specification
-c nvidia/label/cuda-12.1.0
might be redundant if the package name already includes it, but it ensures the correct source. -
Install PyTorch for CUDA 12.1 (NO AUDIO, NO VISION): This specific command installs PyTorch compiled for CUDA 12.1.
pip3
is often an alias forpip
within Conda environments. Usepip
ifpip3
is not found.# Ensure (./env) is active pip3 install torch --index-url https://download.pytorch.org/whl/cu121 # If pip3 gives an error, try: # pip install torch --index-url https://download.pytorch.org/whl/cu121
Still inside the activated env
environment:
-
Set Build Arguments in PowerShell: These instruct
pip
to compilellama-cpp-python
with CUDA support (CUBLAS) and to use the system'snvcc.exe
(found viaCUDA_PATH
).# Ensure (./env) is active! $env:CMAKE_ARGS="-DLLAMA_CUBLAS=on" # This line is crucial for explicitly pointing to the CUDA compiler: $env:CUDA_CXX="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe"
-
Installation via pip: Use
pip
(orpip3
if that's your preference and it works in your Conda env).# Ensure (./env) is active pip install llama-cpp-python[server] --upgrade --force-reinstall --no-cache-dir
This process might take some time. Look for "Successfully built llama-cpp-python".
-
Start Python in your activated Conda environment (
env
):# Ensure (./env) is active python
-
Test PyTorch CUDA availability and
llama-cpp-python
import:import torch import os print(f"PyTorch version: {torch.__version__}") print(f"Is CUDA available for PyTorch? {torch.cuda.is_available()}") if torch.cuda.is_available(): print(f"PyTorch CUDA version: {torch.version.cuda}") print(f"Number of GPUs PyTorch sees: {torch.cuda.device_count()}") if torch.cuda.device_count() > 0: print(f"Current GPU Model (PyTorch): {torch.cuda.get_device_name(0)}") print("\nAttempting to import Llama...") try: from llama_cpp import Llama print("Llama imported successfully!") # For a more thorough test, you'd load a model with n_gpu_layers > 0 # llm = Llama(model_path="path/to/your.gguf", n_gpu_layers=30) # print("Llama object initialized (this would test actual GPU offload).") except Exception as e: print(f"Error importing or initializing Llama: {e}") print("\nChecking CMAKE_ARGS from Python environment:") print(f"CMAKE_ARGS: {os.environ.get('CMAKE_ARGS')}") quit()
-
Download Models: Get GGUF models (e.g., from TheBloke on Hugging Face) and place them in a subfolder like
D:\AI\LlamaCPPProject\models
. -
Start Application (Example with Python server):
# Ensure (./env) is active! # Ensure $env:CMAKE_ARGS="-DLLAMA_CUBLAS=on" is set in this PowerShell session python -m llama_cpp.server --model D:\AI\LlamaCPPProject\models\YOUR_MODEL.gguf --n_gpu_layers -1
Monitor for GPU offload messages and check Task Manager for GPU activity.
While this guide uses CUDA 12.1 (because it worked reliably for this specific setup), here's a more universal approach for Windows AI/ML development:
Recommendation: Install CUDA 11.8, 12.6, and 12.8 on your system. These three versions cover compatibility with almost every AI project you'll encounter.
Two main approaches exist for PyTorch installation:
- Official PyTorch Builds: Current official support for CUDA 11.8, 12.6, and 12.8 only
- Pre-built Wheels: Community/third-party wheels available for more CUDA versions (like 12.1, 12.4)
Use Cases by CUDA Version:
- CUDA 11.8: Legacy projects, older Stable Diffusion models, most GitHub repositories from 2022-2023
- CUDA 12.6: Current mainstream AI projects, latest PyTorch features, balanced compatibility
- CUDA 12.8: Cutting-edge frameworks, latest GPU architectures (RTX 50xx series), experimental features
Why These Specific Versions?
- CUDA 11.8: Compatible with most older AI frameworks and models
- CUDA 12.6: Current stable PyTorch official support and modern projects
- CUDA 12.8: Latest stable version for cutting-edge frameworks and newest GPUs
-
System Environment Variables:
- Configure
CUDA_PATH
,CUDA_HOME
pointing to your primary CUDA version - Add all CUDA
bin
directories to the system "Path" variable (found in System Environment Variables)
- Configure
-
Install PyTorch with exact version matching:
For pip users (Official PyTorch builds - Recommended):
# CUDA 11.8 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # CUDA 12.6 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126 # CUDA 12.8 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
For conda users:
# CUDA 11.8 conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia # CUDA 12.1 (for this guide - uses pre-built wheels) conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia # CUDA 12.6 conda install pytorch torchvision torchaudio pytorch-cuda=12.6 -c pytorch -c nvidia
Result: With this setup, you can clone and run almost any AI project from GitHub without build issues or CUDA compatibility problems.
⚠️ I have CUDA 11.8, 12.4, and 12.8 installed on my system – you can install as many CUDA versions side by side as you like!⚠️ See critical setup requirements above on configuring your system environment variables (CUDA_PATH, CUDA_HOME, Path)! Always make sure these point to the CUDA version you want to use.💡Here's an additional tip:💡
When you clone an AI repository (e.g., for LLMs, Diffusion models, etc.), it's a good practice to first check the
requirements.txt
file (or similar dependency files). This file often specifies the exact Torch version required by the project. Afterwards, you can visit the PyTorch - Previous Versions page to see which CUDA version is best suited for that Torch version and find the correct installation command. This can help you avoid compatibility issues from the start.❗❗Before anything else, first set your environment variables, then install the correct CUDA-enabled version of PyTorch—never install requirements.txt before completing these steps.❗❗
This guide was created based on the information provided and the specified corrections. Last Updated: June 2025