8000 Allow cublas an cudnn to be in different nvidia folders by dannyjeck · Pull Request #92122 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Allow cublas an cudnn to be in different nvidia folders #92122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

dannyjeck
Copy link
Contributor

Fixes #92096

@pytorch-bot
Copy link
pytorch-bot bot commented Jan 13, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92122

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4ef41de:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla
Copy link
linux-foundation-easycla bot commented Jan 13, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: dannyjeck / name: Daniel Jeck (4ef41de)

Copy link
Contributor
@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign CLA (and keep Linux assert in place), otherwise looks good to me

Comment on lines 149 to 150
# Should only be called on Linux if default path resolution have failed
assert platform.system() == 'Linux', 'Should only be called on Linux'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep this check, as .so checks make sense only on Linux

Comment on lines +155 to +159
candidate_cublas_path = os.path.join(nvidia_path, 'cublas', 'lib', 'libcublas.so.11')
if os.path.exists(candidate_cublas_path) and not cublas_path:
cublas_path = candidate_cublas_path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
candidate_cublas_path = os.path.join(nvidia_path, 'cublas', 'lib', 'libcublas.so.11')
if os.path.exists(candidate_cublas_path) and not cublas_path:
cublas_path = candidate_cublas_path
candidate_cublas_path = os.path.join(nvidia_path, 'cublas', 'lib', 'libcublas.so.11')
if os.path.exists(candidate_cublas_path) and not cublas_path:
if cublas_path is not None:
import warnings
warnings.warn(f"Replacing previosly found cublas {cublas_path} with new candidate {candidate_cublas_path}")
cublas_path = candidate_cublas_path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by your suggestion. As written the warning will never happen. Did you mean the following?

if os.path.exists(candidate_cublas_path):
    if cublas path is not None:
        import warnings
        warnings.warn(f"Replacing previosly found cublas {cublas_path
8000
} with new candidate {candidate_cublas_path}")
    cublas_path = candidate_cublas_path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But also I thought the general pattern is to take libraries from earlier in the path if possible, so shouldn't we never do the replacement?

Copy link
Contributor Author
@dannyjeck dannyjeck Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet I've pushed just the assert. LMK if you definitely want these warnings and modified logic.

cublas_path = candidate_cublas_path
candidate_cudnn_path = os.path.join(nvidia_path, 'cudnn', 'lib', 'libcudnn.so.8')
if os.path.exists(candidate_cudnn_path) and not cudnn_path:
cudnn_path = candidate_cudnn_path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the above, please add a warning there

@dannyjeck
Copy link
Contributor Author

I've signed the CLA document. I'm not sure why the check is failing.

@dannyjeck
Copy link
Contributor Author

/easycla

@dannyjeck dannyjeck force-pushed the djeck/allow-different-cuda-folders branch from d3345ea to 97176fc Compare January 13, 2023 17:04
@dannyjeck
Copy link
Contributor Author

/easycla

@dannyjeck dannyjeck force-pushed the djeck/allow-different-cuda-folders branch from 97176fc to 4ef41de Compare January 13, 2023 17:06
@dannyjeck
Copy link
Contributor Author

CLA fixed

@dannyjeck
Copy link
Contributor Author

@malfet I think this is good to merge, though it looks like I can't merge it myself.

@malfet
Copy link
Contributor
malfet commented Jan 23, 2023

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 23, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again. You can rebase by leaving the following comment on this PR:
@pytorchbot rebase

Details for Dev Infra team Raised by workflow job

@dannyjeck
Copy link
Contributor Author

@pytorchbot rebase

@pytorch-bot
Copy link
pytorch-bot bot commented Jan 24, 2023

You don't have permissions to rebase this PR since you are a first time contributor. If you think this is a mistake, please contact PyTorch Dev Infra.

@malfet
Copy link
Contributor
malfet commented Jan 24, 2023

@pytorchbot merge -f "Lint is green and this code is not really tested by CI"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged open source topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

_preload_cuda_deps does not work if cublas and cudnn are in different nvidia folders
4 participants
0