8000 Torch 2.6.0 cu126 is missing several dependencies in the METADATA-file · Issue #146679 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Torch 2.6.0 cu126 is missing several dependencies in the METADATA-file #146679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
anates opened this issue Feb 7, 2025 · 7 comments
Closed

Torch 2.6.0 cu126 is missing several dependencies in the METADATA-file #146679

anates opened this issue Feb 7, 2025 · 7 comments
Assignees
Labels
high priority module: regression It used to work, and now it doesn't oncall: releng In support of CI and Release Engineering triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone

Comments

@anates
Copy link
anates commented Feb 7, 2025

🐛 Describe the bug

When upgrading from torch-2.6.0+cu124 to torch-2.6.0+cu126 on unix, several dependencies are lost in the METADATA-file:

For cu124 the following packages exist:

Requires-Dist: nvidia-cuda-nvrtc-cu12 (==12.4.127) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cuda-runtime-cu12 (==12.4.127) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cuda-cupti-cu12 (==12.4.127) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cudnn-cu12 (==9.1.0.70) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cublas-cu12 (==12.4.5.8) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cufft-cu12 (==11.2.1.3) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-curand-cu12 (==10.3.5.147) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cusolver-cu12 (==11.6.1.9) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cusparse-cu12 (==12.3.1.170) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cusparselt-cu12 (==0.6.2) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-nccl-cu12 (==2.21.5) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-nvtx-cu12 (==12.4.127) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-nvjitlink-cu12 (==12.4.127) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: triton (==3.2.0) ; platform_system == "Linux" and platform_machine == "x86_64"

However, for cu126 these are no longer available in the unix-builds, only in the windows-based builds. This leads to issues such as python-poetry/poetry#10152 (comment)

Versions

Not relevant for this bug/issue

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim

@malfet
Copy link
Contributor
malfet commented Feb 7, 2025

Hi-pri to get a repro and update metadata files
cc: @atalman

@atalman atalman self-assigned this Feb 7, 2025
@atalman
Copy link
Contributor
atalman commented Feb 10, 2025

For CUDA 12.6 Linux Builds. This is Metadata:

Requires-Dist: nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cuda-runtime-cu12==12.6.77; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cuda-cupti-cu12==12.6.80; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cudnn-cu12==9.5.1.17; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cublas-cu12==12.6.4.1; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cufft-cu12==11.3.0.4; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-curand-cu12==10.3.7.77; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cusolver-cu12==11.7.1.2; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cusparse-cu12==12.5.4.2; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cusparselt-cu12==0.6.3; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-nccl-cu12==2.21.5; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-nvtx-cu12==12.6.77; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-nvjitlink-cu12==12.6.85; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: triton==3.2.0; platform_system == "Linux" and platform_machine == "x86_64"

Taken from: https://download.pytorch.org/whl/cu126/torch-2.6.0%2Bcu126-cp311-cp311-manylinux_2_28_x86_64.whl

@anates Please note torch-2.6.0+cu126-cp311-cp311-linux_aarch64 wheel is missing these dependencies since it packages all the dependencies inside the wheel in torch.lib folder:

total 7529192
      0 drwxr-xr-x@  41 atalman  staff       1312 25 Jan 00:12 .
      0 drwxr-xr-x@ 111 atalman  staff       3552 10 Feb 10:21 ..
  28504 -rwxr-xr-x@   1 atalman  staff   14590833 25 Jan 00:12 libarm_compute.so
   2440 -rwxr-xr-x@   1 atalman  staff    1245833 25 Jan 00:12 libarm_compute_graph.so
   2376 -rwxr-xr-x@   1 atalman  staff    1216352 24 Jan 19:16 libc10.so
   1328 -rwxr-xr-x@   1 atalman  staff     678456 24 Jan 19:16 libc10_cuda.so
    144 -rwxr-xr-x@   1 atalman  staff      72016 24 Jan 19:16 libcaffe2_nvrtc.so
 206288 -rwxr-xr-x@   1 atalman  staff  105619232 25 Jan 00:11 libcublas.so.12
 945520 -rwxr-xr-x@   1 atalman  staff  470616952 25 Jan 00:11 libcublasLt.so.12
   1448 -rwxr-xr-x@   1 atalman  staff     738377 25 Jan 00:11 libcudart.so.12
    256 -rwxr-xr-x@   1 atalman  staff     129016 25 Jan 00:11 libcudnn.so.9
 473832 -rwxr-xr-x@   1 atalman  staff  242599880 25 Jan 00:11 libcudnn_adv.so.9
   8896 -rwxr-xr-x@   1 atalman  staff    4551208 25 Jan 00:11 libcudnn_cnn.so.9
 848024 -rwxr-xr-x@   1 atalman  staff  429878808 25 Jan 00:11 libcudnn_engines_precompiled.so.9
  37648 -rwxr-xr-x@   1 atalman  staff   19274896 25 Jan 00:11 libcudnn_engines_runtime_compiled.so.9
   5400 -rwxr-xr-x@   1 atalman  staff    2761136 25 Jan 00:11 libcudnn_graph.so.9
  93112 -rwxr-xr-x@   1 atalman  staff   47670808 25 Jan 00:12 libcudnn_heuristic.so.9
 210480 -rwxr-xr-x@   1 atalman  staff  107762976 25 Jan 00:11 libcudnn_ops.so.9
 540160 -rwxr-xr-x@   1 atalman  staff  276561312 25 Jan 00:11 libcufft.so.11
  13976 -rwxr-xr-x@   1 atalman  staff    7155345 25 Jan 00:11 libcupti.so.12
 188088 -rwxr-xr-x@   1 atalman  staff   96300248 25 Jan 00:11 libcurand.so.10
 290456 -rwxr-xr-x@   1 atalman  staff  148713192 25 Jan 00:11 libcusolver.so.11
 573280 -rw
8000
xr-xr-x@   1 atalman  staff  289733120 25 Jan 00:11 libcusparse.so.12
 459920 -rw-r--r--@   1 atalman  staff  235478385 25 Jan 00:11 libcusparseLt.so.0
   2904 -rwxr-xr-x@   1 atalman  staff    1485025 25 Jan 00:12 libgfortran.so.5
    672 -rwxr-xr-x@   1 atalman  staff     343913 25 Jan 00:12 libgomp.so.1
  91416 -rwxr-xr-x@   1 atalman  staff   46801529 25 Jan 00:11 libnvJitLink.so.12
    136 -rwxr-xr-x@   1 atalman  staff      68321 25 Jan 00:11 libnvToolsExt.so.1
   4928 -rwxr-xr-x@   1 atalman  staff    2522080 10 Dec 19:39 libnvpl_blas_core.so.0
    880 -rwxr-xr-x@   1 atalman  staff     448400 10 Dec 19:39 libnvpl_blas_lp64_gomp.so.0
   3344 -rw-r--r--@   1 atalman  staff    1710521 25 Jan 00:12 libnvpl_lapack_core.so.0
  11424 -rw-r--r--@   1 atalman  staff    5848968 10 Dec 19:39 libnvpl_lapack_lp64_gomp.so.0
  10504 -rwxr-xr-x@   1 atalman  staff    5375561 25 Jan 00:11 libnvrtc-builtins.so.12.6
 106128 -rwxr-xr-x@   1 atalman  staff   54335681 25 Jan 00:11 libnvrtc.so.12
    160 -rwxr-xr-x@   1 atalman  staff      79592 24 Jan 20:00 libshm.so
    152 -rwxr-xr-x@   1 atalman  staff      75224 25 Jan 00:05 libtorch.so
 454480 -rwxr-xr-x@   1 atalman  staff  232692632 24 Jan 20:00 libtorch_cpu.so
1860352 -rwxr-xr-x@   1 atalman  staff  943158448 25 Jan 00:05 libtorch_cuda.so
   1208 -rwxr-xr-x@   1 atalman  staff     616256 25 Jan 00:05 libtorch_cuda_linalg.so
    144 -rwxr-xr-x@   1 atalman  staff      70000 24 Jan 19:16 libtorch_global_deps.so
  48784 -rwxr-xr-x@   1 atalman  staff   24976488 25 Jan 00:09 libtorch_python.so

Looks like the reason maybe is that poetry config reads linux_aarch64 metadata rather then manylinux 2.28 wheel metadata.

@anates
Copy link
Author
anates commented Feb 10, 2025

@atalman Hei,
Thanks for the feedback! That would mean that poetry is reading metadata from the wrong location instead, as torch-2.6.0+cu126 incorporates all necessary information? And could you clarify what you mean with "before manylinux 2.28 wheel"?
Thank you very much!

@janeyx99 janeyx99 added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed triage review labels Feb 10, 2025
@atalman
Copy link
Contributor
atalman commented Feb 10, 2025

@anates Looks like poetry expects consistent METADATA across all the wheels as per comment: python-poetry/poetry#10152 (comment)

We will be fixing this for Release 2.6.1

@tmm1
Copy link
Contributor
tmm1 commented Feb 21, 2025

I had run into a similar issue with astral-sh/uv#10693 and opened #145021

@atalman
Copy link
Contributor
atalman commented Mar 5, 2025

@ZainRizvi we would need to add METADATA section to following build:
https://download.pytorch.org/whl/cu126/torch-2.6.0%2Bcu126-cp310-cp310-linux_aarch64.whl
This is CUDA 12.6 and 12.8 AARCH64 build

This is the workflow theat builds this binary:
https://github.com/pytorch/pytorch/actions/runs/13670795797/job/38220438638

Triton is populated here: https://github.com/pytorch/pytorch/blob/main/.circleci/scripts/binary_populate_env.sh#L83

Via: PYTORCH_EXTRA_INSTALL_REQUIREMENTS
https://github.com/pytorch/pytorch/blob/main/.ci/manywheel/build_cuda.sh#L131

@ZainRizvi
Copy link
Contributor

Resolved by #145021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority module: regression It used to work, and now it doesn't oncall: releng In support of CI and Release Engineering triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Archived in project
Development

No branches or pull requests

6 participants
0