8000 Can't install GPU version for windows for many times. · Issue #1393 · abetlen/llama-cpp-python · GitHub
[go: up one dir, main page]

Skip to content

Can't install GPU version for windows for many times. #1393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
XunfunLee opened this issue Apr 27, 2024 · 11 comments
Open

Can't install GPU version for windows for many times. #1393

XunfunLee opened this issue Apr 27, 2024 · 11 comments

Comments

@XunfunLee
Copy link

Issues

I am trying to install the lastest version of llama-cpp-python in my windows 11 with RTX-3090ti(24G). I have successfully installed llama-cpp-python=0.1.87 (can't exactly remember) months ago while using:

set FORCE_CMAKE=1
set CMAKE_ARGS=-DLLAMA_CUBLAS=on
pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

But when I want to access the latest version recently by using:

set CMAKE_ARGS="-DLLAMA_CUDA=on"
pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

After loading the model, it is still using CPU with BLAS=0 (or is another params = 1 instead of BLAS in new version?).

llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 1024
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =   256.00 MiB
llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.49 MiB
llama_new_context_with_model:        CPU compute buffer size =   258.50 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 1
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LAMMAFILE = 1 |
Model metadata: {'general.name': 'Meta-Llama-3-8B-Instruct-imatrix', 'general.architecture': 'llama', 'llama.block_count': '32', 'llama.context_length': '8192', 'tokenizer.ggml.eos_token_id': '128001', 'general.file_type': '18', 'llama.attention.head_count_kv': '8', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '14336', 'llama.attention.head_count': '32', 'llama.rope.freq_base': '500000.000000', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.vocab_size': '128256', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.model': 'gpt2', 'general.quantization_version': '2', 'tokenizer.ggml.bos_token_id': '128000', 'tokenizer.chat_template': "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}"}
Using gguf chat template: {% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>

I have been trying the pre-build wheel for CUDA 12.1 (pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121) and it still doesn't work. I add --verbose to see the output:

loading initial cache file C:\Users\Administrator\AppData\Local\Temp\tmpbbvy3nqu\build\CMakeInit.txt
  -- Building for: Visual Studio 17 2022
  -- Selecting Windows SDK version 10.0.22621.0 to target Windows 10.0.22631.
  -- The C compiler identification is MSVC 19.39.33523.0
  -- The CXX compiler identification is MSVC 19.39.33523.0
  -- Detecting C compiler ABI info
  -- Detecting C compiler ABI info - done
  -- Check for working C compiler: F:/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - skipped
  -- Detecting C compile features
  -- Detecting C compile features - done
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working CXX compiler: F:/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - skipped
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- Found Git: F:/Git/cmd/git.exe (found version "2.44.0.windows.1")
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
  -- Looking for pthread_create in pthreads
  -- Looking for pthread_create in pthreads - not found
  -- Looking for pthread_create in pthread
  -- Looking for pthread_create in pthread - not found
  -- Found Threads: TRUE
  -- Warning: ccache not found - consider installing it for faster compilation or disable this warning with LLAMA_CCACHE=OFF
  -- CMAKE_SYSTEM_PROCESSOR: AMD64
  -- CMAKE_GENERATOR_PLATFORM: x64
  -- x86 detected
  -- Performing Test HAS_AVX_1
  -- Performing Test HAS_AVX_1 - Success
  -- Performing Test HAS_AVX2_1
  -- Performing Test HAS_AVX2_1 - Success
  -- Performing Test HAS_FMA_1
  -- Performing Test HAS_FMA_1 - Success
  -- Performing Test HAS_AVX512_1
  -- Performing Test HAS_AVX512_1 - Failed
  -- Performing Test HAS_AVX512_2
  -- Performing Test HAS_AVX512_2 - Failed

Environment

python=3.12
C++ compiler: viusal studio 2022 (with necessary C++ modules)
cmake --version = 3.29.2
nvcc -V = CUDA 12.1 (while nvidia-smi cuda version is 12.3, i think it is not related to this issues)

I have been download and install VS2022, CUDA toolkit, cmake and anaconda, I am wondering if some steps are missing. Considering my previous experience there is no need to git clone this code and cd into it to build (Though I did that on my mac to build a pth file to bin file months ago).

My system variables are listed below:

  • F:\Anaconda\Scripts
  • F:\CMake\bin
  • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
  • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin

Questions

  1. Are there some steps i have been missing for build the llama-cpp for GPU version?
  2. How to know if it is built for GPU version when i run pip install llama-cpp-python instead of loading model to check BLAS=1.
  3. Do I need to git clone this code and cd into some dir, create some file or dir to run the pip install llama-cpp-py?
@XunfunLee
Copy link
Author

I checked #1352 , and is there an issus related to windows 11? I just thought it this the problem of my installation steps or my machine. Is there an official explaination plz !!!

@holchan
Copy link
holchan commented Apr 27, 2024

Im having the same problem but with Linux, 20,04, using Kaggle Notebook, worked fine until yesterday.

edit: pip install llama-cpp-python==0.2.64 solves the problem.

@XunfunLee
Copy link
Author

Im having the same problem but with Linux, 20,04, using Kaggle Notebook, worked fine until yesterday.

edit: pip install llama-cpp-python==0.2.64 solves the problem.

Still not working, I have been trying 0.2.64, 0.2.60, 0.2.59 for many times and it seems to said:

Creating "ggml_shared.dir\Release\ggml_shared.tlog\unsuccessfulbuild" because "AlwaysCreate" was specified.
    Touching "ggml_shared.dir\Release\ggml_shared.tlog\unsuccessfulbuild".
  CustomBuild:
    Building Custom Rule C:/Users/Administrator/AppData/Local/Temp/pip-install-_thkprn2/llama-cpp-python_9fa670d7909f4acfb3ac1882363d1df6/vendor/llama.cpp/CMakeLists.txt

@VinzanR
Copy link
VinzanR commented Apr 28, 2024

The lama.dll is Win32 and we are 64 Bit on Windows 11, if I debug the C++ checker program in Win32 then the lama.dll loads successfully, but for 64 bit nope.
#include
#include <windows.h>

int main() {
// Update this path to the actual location of the llama_cpp.dll
HINSTANCE hDLL = LoadLibrary(TEXT("C:\my_path\llama-cpp-python\llama_cpp\llama.dll"));

if (hDLL == NULL) {
	std::cerr << "ERROR: unable to load DLL" << std::endl;
	return 1;
}

std::cout << "DLL loaded successfully" << std::endl;

FreeLibrary(hDLL);
return 0;

}

@XunfunLee
Copy link
Author

The lama.dll is Win32 and we are 64 Bit on Windows 11, if I debug the C++ checker program in Win32 then the lama.dll loads successfully, but for 64 bit nope. #include #include <windows.h>

int main() { // Update this path to the actual location of the llama_cpp.dll HINSTANCE hDLL = LoadLibrary(TEXT("C:\my_path\llama-cpp-python\llama_cpp\llama.dll"));

if (hDLL == NULL) {
	std::cerr << "ERROR: unable to load DLL" << std::endl;
	return 1;
}

std::cout << "DLL loaded successfully" << std::endl;

FreeLibrary(hDLL);
return 0;

}
Well I think i can understand why, but I still don't know how to fix this problem, can you give more info or steps plz?

@parveen232
Copy link

cuda version i'm using v12.4, windows 10 i think it will also work with windows 11

I have tried this from Windows PowerShell and it works for me

$env:CMAKE_ARGS="-DLLAMA_CUBLAS=on"
$env:CUDACXX="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc.exe"
pip install --upgrade --force-reinstall --no-cache-dir llama-cpp-python

@MarianoMolina
Copy link

I'm having the same issue. I have CUDA installed, nvcc works, and CUDA_PATH is set.
Doing:
set CMAKE_ARGS=-DLLAMA_CUDA=ON
pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose

Don't see any errors in the installation.
Yet when I run it I get BLAS = 0:

AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 |
VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |

Getting the same result with:
set CMAKE_ARGS=-DLLAMA_CUBLAS=ON
pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose

@XunfunLee
Copy link
Author

cuda version i'm using v12.4, windows 10 i think it will also work with windows 11

I have tried this from Windows PowerShell and it works for me

$env:CMAKE_ARGS="-DLLAMA_CUBLAS=on"
$env:CUDACXX="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc.exe"
pip install --upgrade --force-reinstall --no-cache-dir llama-cpp-python

It seems to be the problem of windows 11, i made it worked in windows 10 months ago (while the version of llama-cpp-python==0.1.72), but when i turn into the latest version with win11 it doesn't work :(

@XunfunLee
Copy link
Author

I'm having the same issue. I have CUDA installed, nvcc works, and CUDA_PATH is set. Doing: set CMAKE_ARGS=-DLLAMA_CUDA=ON pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose

Don't see any errors in the installation. Yet when I run it I get BLAS = 0:

AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |

Getting the same result with: set CMAKE_ARGS=-DLLAMA_CUBLAS=ON pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose

Yeap, if you make it work plz let me know :) I will keep trying to find the solution as well.

@MarianoMolina
Copy link

Although it didn't work initially, I was able to download the prebuilt wheel and it works, now I am getting GPU inference. It does seem like there is an issue with my environment in some way.

@Granddyser
Copy link
Granddyser commented May 18, 2025

i made a guide may be its helpful https://github.com/Granddyser/windows-llama-cpp-python-cuda-guide

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
0