Can't install GPU version for windows for many times. #1393

XunfunLee · 2024-04-27T10:50:53Z

Issues

I am trying to install the lastest version of llama-cpp-python in my windows 11 with RTX-3090ti(24G). I have successfully installed llama-cpp-python=0.1.87 (can't exactly remember) months ago while using:

set FORCE_CMAKE=1
set CMAKE_ARGS=-DLLAMA_CUBLAS=on
pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

But when I want to access the latest version recently by using:

set CMAKE_ARGS="-DLLAMA_CUDA=on"
pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

After loading the model, it is still using CPU with BLAS=0 (or is another params = 1 instead of BLAS in new version?).

llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 1024
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =   256.00 MiB
llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.49 MiB
llama_new_context_with_model:        CPU compute buffer size =   258.50 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 1
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LAMMAFILE = 1 |
Model metadata: {'general.name': 'Meta-Llama-3-8B-Instruct-imatrix', 'general.architecture': 'llama', 'llama.block_count': '32', 'llama.context_length': '8192', 'tokenizer.ggml.eos_token_id': '128001', 'general.file_type': '18', 'llama.attention.head_count_kv': '8', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '14336', 'llama.attention.head_count': '32', 'llama.rope.freq_base': '500000.000000', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.vocab_size': '128256', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.model': 'gpt2', 'general.quantization_version': '2', 'tokenizer.ggml.bos_token_id': '128000', 'tokenizer.chat_template': "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}"}
Using gguf chat template: {% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>

I have been trying the pre-build wheel for CUDA 12.1 (pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121) and it still doesn't work. I add --verbose to see the output:

loading initial cache file C:\Users\Administrator\AppData\Local\Temp\tmpbbvy3nqu\build\CMakeInit.txt
  -- Building for: Visual Studio 17 2022
  -- Selecting Windows SDK version 10.0.22621.0 to target Windows 10.0.22631.
  -- The C compiler identification is MSVC 19.39.33523.0
  -- The CXX compiler identification is MSVC 19.39.33523.0
  -- Detecting C compiler ABI info
  -- Detecting C compiler ABI info - done
  -- Check for working C compiler: F:/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - skipped
  -- Detecting C compile features
  -- Detecting C compile features - done
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working CXX compiler: F:/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - skipped
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- Found Git: F:/Git/cmd/git.exe (found version "2.44.0.windows.1")
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
  -- Looking for pthread_create in pthreads
  -- Looking for pthread_create in pthreads - not found
  -- Looking for pthread_create in pthread
  -- Looking for pthread_create in pthread - not found
  -- Found Threads: TRUE
  -- Warning: ccache not found - consider installing it for faster compilation or disable this warning with LLAMA_CCACHE=OFF
  -- CMAKE_SYSTEM_PROCESSOR: AMD64
  -- CMAKE_GENERATOR_PLATFORM: x64
  -- x86 detected
  -- Performing Test HAS_AVX_1
  -- Performing Test HAS_AVX_1 - Success
  -- Performing Test HAS_AVX2_1
  -- Performing Test HAS_AVX2_1 - Success
  -- Performing Test HAS_FMA_1
  -- Performing Test HAS_FMA_1 - Success
  -- Performing Test HAS_AVX512_1
  -- Performing Test HAS_AVX512_1 - Failed
  -- Performing Test HAS_AVX512_2
  -- Performing Test HAS_AVX512_2 - Failed

Environment

python=3.12
C++ compiler: viusal studio 2022 (with necessary C++ modules)
cmake --version = 3.29.2
nvcc -V = CUDA 12.1 (while nvidia-smi cuda version is 12.3, i think it is not related to this issues)

I have been download and install VS2022, CUDA toolkit, cmake and anaconda, I am wondering if some steps are missing. Considering my previous experience there is no need to git clone this code and cd into it to build (Though I did that on my mac to build a pth file to bin file months ago).

My system variables are listed below:

F:\Anaconda\Scripts
F:\CMake\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin

Questions

Are there some steps i have been missing for build the llama-cpp for GPU version?
How to know if it is built for GPU version when i run pip install llama-cpp-python instead of loading model to check BLAS=1.
Do I need to git clone this code and cd into some dir, create some file or dir to run the pip install llama-cpp-py?

The text was updated successfully, but these errors were encountered:

XunfunLee · 2024-04-27T11:11:37Z

I checked #1352 , and is there an issus related to windows 11? I just thought it this the problem of my installation steps or my machine. Is there an official explaination plz !!!

holchan · 2024-04-27T19:23:14Z

Im having the same problem but with Linux, 20,04, using Kaggle Notebook, worked fine until yesterday.

edit: pip install llama-cpp-python==0.2.64 solves the problem.

XunfunLee · 2024-04-28T01:50:40Z

Im having the same problem but with Linux, 20,04, using Kaggle Notebook, worked fine until yesterday.

edit: pip install llama-cpp-python==0.2.64 solves the problem.

Still not working, I have been trying 0.2.64, 0.2.60, 0.2.59 for many times and it seems to said:

Creating "ggml_shared.dir\Release\ggml_shared.tlog\unsuccessfulbuild" because "AlwaysCreate" was specified.
    Touching "ggml_shared.dir\Release\ggml_shared.tlog\unsuccessfulbuild".
  CustomBuild:
    Building Custom Rule C:/Users/Administrator/AppData/Local/Temp/pip-install-_thkprn2/llama-cpp-python_9fa670d7909f4acfb3ac1882363d1df6/vendor/llama.cpp/CMakeLists.txt

VinzanR · 2024-04-28T20:59:08Z

The lama.dll is Win32 and we are 64 Bit on Windows 11, if I debug the C++ checker program in Win32 then the lama.dll loads successfully, but for 64 bit nope.
#include
#include <windows.h>

int main() {
// Update this path to the actual location of the llama_cpp.dll
HINSTANCE hDLL = LoadLibrary(TEXT("C:\my_path\llama-cpp-python\llama_cpp\llama.dll"));

if (hDLL == NULL) {
	std::cerr << "ERROR: unable to load DLL" << std::endl;
	return 1;
}

std::cout << "DLL loaded successfully" << std::endl;

FreeLibrary(hDLL);
return 0;

}

XunfunLee · 2024-04-29T08:44:59Z

The lama.dll is Win32 and we are 64 Bit on Windows 11, if I debug the C++ checker program in Win32 then the lama.dll loads successfully, but for 64 bit nope. #include #include <windows.h>

int main() { // Update this path to the actual location of the llama_cpp.dll HINSTANCE hDLL = LoadLibrary(TEXT("C:\my_path\llama-cpp-python\llama_cpp\llama.dll"));
if (hDLL == NULL) {
	std::cerr << "ERROR: unable to load DLL" << std::endl;
	return 1;
}

std::cout << "DLL loaded successfully" << std::endl;

FreeLibrary(hDLL);
return 0;
}
Well I think i can understand why, but I still don't know how to fix this problem, can you give more info or steps plz?

parveen232 · 2024-04-30T13:39:36Z

cuda version i'm using v12.4, windows 10 i think it will also work with windows 11

I have tried this from Windows PowerShell and it works for me

$env:CMAKE_ARGS="-DLLAMA_CUBLAS=on"
$env:CUDACXX="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc.exe"
pip install --upgrade --force-reinstall --no-cache-dir llama-cpp-python

MarianoMolina · 2024-05-07T22:21:25Z

I'm having the same issue. I have CUDA installed, nvcc works, and CUDA_PATH is set.
Doing:
set CMAKE_ARGS=-DLLAMA_CUDA=ON
pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose

Don't see any errors in the installation.
Yet when I run it I get BLAS = 0:

Getting the same result with:
set CMAKE_ARGS=-DLLAMA_CUBLAS=ON
pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose

XunfunLee · 2024-05-09T03:44:57Z

cuda version i'm using v12.4, windows 10 i think it will also work with windows 11

I have tried this from Windows PowerShell and it works for me
$env:CMAKE_ARGS="-DLLAMA_CUBLAS=on"
$env:CUDACXX="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc.exe"
pip install --upgrade --force-reinstall --no-cache-dir llama-cpp-python

It seems to be the problem of windows 11, i made it worked in windows 10 months ago (while the version of llama-cpp-python==0.1.72), but when i turn into the latest version with win11 it doesn't work :(

XunfunLee · 2024-05-09T03:45:57Z

I'm having the same issue. I have CUDA installed, nvcc works, and CUDA_PATH is set. Doing: set CMAKE_ARGS=-DLLAMA_CUDA=ON pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose

Don't see any errors in the installation. Yet when I run it I get BLAS = 0:

AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |

Getting the same result with: set CMAKE_ARGS=-DLLAMA_CUBLAS=ON pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose

Yeap, if you make it work plz let me know :) I will keep trying to find the solution as well.

MarianoMolina · 2024-05-09T15:58:27Z

Although it didn't work initially, I was able to download the prebuilt wheel and it works, now I am getting GPU inference. It does seem like there is an issue with my environment in some way.

Granddyser · 2025-05-18T02:35:59Z

i made a guide may be its helpful https://github.com/Granddyser/windows-llama-cpp-python-cuda-guide

XunfunLee mentioned this issue Apr 27, 2024

Issues building on Win11 re: CMakeCUDACompilerId, Powershell, VSC #1368

Closed

Vinventive mentioned this issue May 1, 2024

Missing LLava 1.6 support for handling custom templates with the respect of the chosen LLM. #1301

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can't install GPU version for windows for many times. #1393

Can't install GPU version for windows for many times. #1393

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Can't install GPU version for windows for many times. #1393

Can't install GPU version for windows for many times. #1393

Comments

Issues

Environment

Questions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!