10000 Load cuda deps more aggressively by keith · Pull Request #137059 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Load cuda deps more aggressively #137059

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

keith
Copy link
@keith keith commented Sep 30, 2024

Previously cuda deps were only loaded if loading the globals library
failed with a cuda shared lib related error. It's possible the globals
library to load successfully, but then for the torch native libraries to
fail with a cuda error. This now handles both of these cases.

Fixes #117350

cc @malfet @seemethere @ptrblck @msaroufim

Copy link
pytorch-bot bot commented Sep 30, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137059

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 7429bbb with merge base a6210fd (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
linux-foundation-easycla bot commented Sep 30, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: keith / name: Keith Smiley (7429bbb)

Copy link
Contributor

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@drisspg drisspg added module: cuda Related to torch.cuda, and CUDA support in general module: build Build system issues labels Oct 1, 2024
@janeyx99 janeyx99 requested a review from eqy October 2, 2024 23:04
@janeyx99 janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 2, 2024
@keith keith force-pushed the ks/load-cuda-deps-more-aggressively branch 2 times, most recently from 36aaa2e to 31cfdfd Compare November 19, 2024 20:35
@keith
Copy link
Author
keith commented Nov 19, 2024

Updated with another case, s 8000 ince sometimes the error doesn't include the missing lib name:

ImportError: dlopen hook: '.../_C.cpython-311-x86_64-linux-gnu.so': cannot open shared object file: No such file or directory

@keith keith force-pushed the ks/load-cuda-deps-more-aggressively branch from 31cfdfd to dd1b62d Compare January 15, 2025 05:05
@keith keith force-pushed the ks/load-cuda-deps-more-aggressively branch from dd1b62d to c9d5142 Compare March 5, 2025 17:35
@keith keith force-pushed the ks/load-cuda-deps-more-aggressively branch 2 times, most recently from 778b01a to 78f0d40 Compare March 24, 2025 18:37
@vihangm
Copy link
vihangm commented Mar 31, 2025

@eqy can we get this merged in? or can you help find other reviewers?
this is blocking the ability to use hermetic python builds in bazel with pytorch.

@tristanleclercq
Copy link

+1

@keith keith force-pushed the ks/load-cuda-deps-more-aggressively branch 3 times, most recently from 9ed8bf8 to c5be05c Compare April 28, 2025 22:22
@keith keith force-pushed the ks/load-cuda-deps-more-aggressively branch from c5be05c to 2a9c685 Compare June 11, 2025 23:47
@keith
Copy link
Author
keith commented Jun 11, 2025

@atalman @Divain @malfet could one of you help review this one? or redirect to the right person? 🙏🏻 thanks!

@malfet
Copy link
Contributor
malfet commented Jun 11, 2025

@keith what's the test plan for this change?

@keith
Copy link
Author
keith commented Jun 12, 2025

similar to #149808 this is fixing something that isn't tested in CI. What's the recommendation for that case? I can easily whip up a bazel project that reproduces the issue, but I imagine that wouldn't be helpful besides a 1 time verification? FWIW we've been using this patch in prod since I submitted this change.

Previously cuda deps were only loaded if loading the globals library
failed with a cuda shared lib related error. It's possible the globals
library to load successfully, but then for the torch native libraries to
fail with a cuda error. This now handles both of these cases.

Fixes pytorch#117350
@keith keith force-pushed the ks/load-cuda-deps-more-aggressively branch from 2a9c685 to 7429bbb Compare June 12, 2025 02:42
@vihangm
Copy link
vihangm commented Jun 12, 2025

My company also uses this patch and it works well to get pytorch functioning with a bazel hermetic python system. Wanted to chime in here incase that helps.
I also concur, writing a true test would be kinda awkward. A minimal repro might be possible but would require setting up a bazel project.

@keith
Copy link
Author
keith commented Jun 24, 2025

@malfet any advice here 🙏🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: build Build system issues module: cuda Related to torch.cuda, and CUDA support in general open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CUDA deps cannot be preloaded under Bazel
7 participants
0