-
Notifications
You must be signed in to change notification settings - Fork 24.3k
Add cusolver gesvdj and gesvdjBatched to the backend of torch.svd #48436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Benchmark for A100 https://github.com/xwang233/code-snippet/tree/master/linalg/svd/A100 Benchmark for V100 https://github.com/xwang233/code-snippet/tree/master/linalg/svd/V100 Benchmark on RTX 2070 super + E5 2680 v3
benchmarktime is in ms (10^-3 s)
|
reserved 2 |
💊 CI failures summary and remediationsAs of commit 3075adc (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
@heitorschueroff This PR is ready to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heitorschueroff has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Thank you @xwang233, I've imported it to phabricator. |
Yeah, looks like it. They are for lascl function (scale matrix by scalar), unclear which library it should come from. |
I tried this on cuda 10.2 with |
@seemethere, @malfet any ideas on why manywheel build is failing? What's the best way to repro it? |
Thanks @malfet . We were able to reproduce the exact error message of FYI, found doc here https://docs.nvidia.com/cuda/cusolver/index.html#static-link-lapack |
@xwang233 cool, can you please add above-mentioned dependency to pytorch/cmake/public/cuda.cmake Lines 334 to 336 in 5016637
if(CUDA_VERSION VERSION_GREATER_EQUAL 10.2) if needed)
|
That is a very strange ROCm error. I don't think it's related. |
It's been showing up on other PRs as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heitorschueroff has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@heitorschueroff merged this pull request in 186c3da. |
This PR adds cusolver
gesvdj
andgesvdjBatched
to the backend oftorch.svd
.I've tested the performance using cuda 11.1 on 2070, V100, and A100. The cusolver gesvdj and gesvdjBatched performances are better than magma in all square matrix cases. So cusolver backend will replace magma backend when available.
When both matrix dimensions are no greater than 32,
gesvdjBatched
is used. Otherwise,gesvdj
is used.Detailed benchmark is available at https://github.com/xwang233/code-snippet/tree/master/linalg/svd.
Some relevant code and discussions
See also #42666 #47953
Close #50516