Build error with LLGUIDANCE

It doesn't build when I include LLGUIDANCE. It does build without it.

Here's the top and bottom of my error log, followed by the successful build log.

╰─❯ CMAKE_ARGS="-DGGML_CUDA=on -DLLAMA_LLGUIDANCE=ON -DCMAKE_CUDA_COMPILER=$(which nvcc) -DCMAKE_CUDA_ARCHITECTURES=86 -DCUDA_TOOLKIT_ROOT_DIR=/opt/cuda" FORCE_CMAKE=1 pip install --upgrade -e .
Obtaining file:///run/media/jeremy/2TB/llama-cpp-python
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
Getting requirements to build editable ... done
Preparing editable metadata (pyproject.toml) ... done
Requirement already satisfied: typing-extensions>=4.5.0 in .venv/lib/python3.10/site-packages (from llama_cpp_python==0.3.16) (4.15.0)
Requirement already satisfied: numpy>=1.20.0 in .venv/lib/python3.10/site-packages (from llama_cpp_python==0.3.16) (2.2.6)
Requirement already satisfied: diskcache>=5.6.1 in .venv/lib/python3.10/site-packages (from llama_cpp_python==0.3.16) (5.6.3)
Requirement already satisfied: jinja2>=2.11.3 in .venv/lib/python3.10/site-packages (from llama_cpp_python==0.3.16) (3.1.6)
Requirement already satisfied: MarkupSafe>=2.0 in .venv/lib/python3.10/site-packages (from jinja2>=2.11.3->llama_cpp_python==0.3.16) (3.0.3)
Building wheels for collected packages: llama_cpp_python
Building editable for llama_cpp_python (pyproject.toml) ... error
error: subprocess-exited-with-error

× Building editable for llama_cpp_python (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [228 lines of output]
*** scikit-build-core 0.11.6 using CMake 4.2.1 (editable)
*** Configuring CMake...
loading initial cache file /tmp/tmpwehh904h/build/CMakeInit.txt
-- The C compiler identification is GNU 15.2.1
-- The CXX com 8D25 piler identification is GNU 15.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMAKE_BUILD_TYPE=Release
-- Found Git: /usr/bin/git (found version "2.52.0")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- GGML_SYSTEM_ARCH: x86
-- Including CPU backend
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- x86 detected
-- Adding CPU backend variant ggml-cpu: -march=native
-- Found CUDAToolkit: /opt/cuda/targets/x86_64-linux/include;/opt/cuda/targets/x86_64-linux/include/cccl (found version "13.0.88")
-- CUDA Toolkit found
-- Using CUDA architectures: 86
-- The CUDA compiler identification is NVIDIA 13.0.88 with host compiler GNU 15.2.1
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /opt/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- CUDA host compiler is GNU 15.2.1
-- Including CUDA backend
-- ggml version: 0.0.6170
-- ggml commit: 4227c9be4
CMake Warning (dev) at CMakeLists.txt:13 (install):
Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
Call Stack (most recent call first):
CMakeLists.txt:108 (llama_cpp_python_install_target)
This warning is for project developers. Use -Wno-dev to suppress it.

  CMake Warning (dev) at CMakeLists.txt:21 (install):
    Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
  Call Stack (most recent call first):
    CMakeLists.txt:108 (llama_cpp_python_install_target)
  This warning is for project developers.  Use -Wno-dev to suppress it.

  CMake Warning (dev) at CMakeLists.txt:13 (install):
    Target ggml has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
  Call Stack (most recent call first):
    CMakeLists.txt:109 (llama_cpp_python_install_target)
  This warning is for project developers.  Use -Wno-dev to suppress it.

  CMake Warning (dev) at CMakeLists.txt:21 (install):
    Target ggml has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
  Call Stack (most recent call first):
    CMakeLists.txt:109 (llama_cpp_python_install_target)
  This warning is for project developers.  Use -Wno-dev to suppress it.

  CMake Warning (dev) at CMakeLists.txt:13 (install):
    Target mtmd has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
  Call Stack (most recent call first):
    CMakeLists.txt:162 (llama_cpp_python_install_target)
  This warning is for project developers.  Use -Wno-dev to suppress it.

  CMake Warning (dev) at CMakeLists.txt:21 (install):
    Target mtmd has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
  Call Stack (most recent call first):
    CMakeLists.txt:162 (llama_cpp_python_install_target)
  This warning is for project developers.  Use -Wno-dev to suppress it.

  -- Configuring done (2.9s)
  -- Generating done (0.0s)
  CMake Warning:
    Manually-specified variables were not used by the project:

      CUDA_TOOLKIT_ROOT_DIR


  -- Build files have been written to: /tmp/tmpwehh904h/build
  *** Building project with Ninja...
  Change Dir: '/tmp/tmpwehh904h/build'

  Run Build Command(s): /usr/bin/ninja -v
  [1/191] cd /tmp/tmpwehh904h/build/vendor/llama.cpp/common && /usr/bin/cmake -Dcfgdir= -P /tmp/tmpwehh904h/build/llguidance/tmp/llguidance_ext-mkdirs.cmake && /usr/bin/cmake -E touch /tmp/tmpwehh904h/build/llguidance/src/llguidance_ext-stamp/llguidance_ext-mkdir
  [2/191] /usr/bin/c++  -pthread -DGGML_BUILD -DGGML_COMMIT=\"4227c9be4\" -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_VERSION=\"0.0.6170\" -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/run/media/jeremy/2TB/llama-cpp-python/vendor/llama.cpp/ggml/src/. -I/run/media/jeremy/2TB/llama-cpp-python/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o -c /run/media/jeremy/2TB/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-threading.cpp

...

[127/191] /opt/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/run/media/jeremy/2TB/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/run/media/jeremy/2TB/llama-cpp-python/vendor/llama.cpp/ggml/src/../include -isystem /opt/cuda/targets/x86_64-linux/include -isystem /opt/cuda/targets/x86_64-linux/include/cccl -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_86,code=[compute_86,sm_86]" -Xcompiler=-fPIC -use_fast_math -extended-lambda -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q8_0.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q8_0.cu.o.d -x cu -c /run/media/jeremy/2TB/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/template-instances/mmq-instance-q8_0.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q8_0.cu.o
[128/191] /opt/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/run/media/jeremy/2TB/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/run/media/jeremy/2TB/llama-cpp-python/vendor/llama.cpp/ggml/src/../include -isystem /opt/cuda/targets/x86_64-linux/include -isystem /opt/cuda/targets/x86_64-linux/include/cccl -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_86,code=[compute_86,sm_86]" -Xcompiler=-fPIC -use_fast_math -extended-lambda -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q6_k.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q6_k.cu.o.d -x cu -c /run/media/jeremy/2TB/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/template-instances/mmq-instance-q6_k.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q6_k.cu.o
ninja: build stopped: subcommand failed.

  *** CMake build failed
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building editable for llama_cpp_python
Failed to build llama_cpp_python
error: failed-wheel-build-for-install

× Failed to build installable wheels for some pyproject.toml based projects
╰─> llama_cpp_python



╰─❯ CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_COMPILER=$(which nvcc) -DCMAKE_CUDA_ARCHITECTURES=86" uv pip install --upgrade --force-reinstall git+https://github.com/abetlen/llama-cpp-python.git --no-cache-dir


    Updated https://github.com/abetlen/llama-cpp-python.git (c37132bac860fcc333255c36313f89c4f49d4c8d)
Resolved 6 packages in 31.97s
      Built llama-cpp-python @ git+https://github.com/abetlen/llama-cpp-python.git@c37132bac860fcc333255c36313f89c4f49d4c8d
Prepared 6 packages in 6m 36s
Uninstalled 6 packages in 32ms
░░░░░░░░░░░░░░░░░░░░ [0/6] Installing wheels...                                                                                                            warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance.
         If the cache and target directories are on different filesystems, hardlinking may not be supported.
         If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning.
Installed 6 packages in 59ms
 ~ diskcache==5.6.3
 ~ jinja2==3.1.6
 - llama-cpp-python==0.3.16
 + llama-cpp-python==0.3.16 (from git+https://github.com/abetlen/llama-cpp-python.git@c37132bac860fcc333255c36313f89c4f49d4c8d)
 ~ markupsafe==3.0.3
 ~ numpy==2.2.6
 ~ typing-extensions==4.15.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Build error with LLGUIDANCE #2107

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Build error with LLGUIDANCE #2107

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions