8000
We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #64237 Fix #28293 Fix #4689
See also #47953
cc @ngimel @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @lezcano
Sorry, something went wrong.
gesvd init
4c3d083
svd uses gesvdj as default
1645440
split gesvdj buffersize
fa894ea
split gesvd buffersize
9920276
zero numel shortcut
5a1142b
As of commit af60e5c (more details on the Dr. CI page):
💚 💚 Looks good so far! There are no failures yet. 💚 💚
Please report bugs/suggestions to the (internal) Dr. CI Users group.
There was a problem hiding this comment.
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR Xiao! Left a few comments. Not that this PR also Fixes #28293
Very slick! If you want, via this trick you could implement #9083 for gesvdj which would save a copy in many cases.
gesvdj
This PR also partially addresses #4689 . Perhaps even with the previous work from @IvanYashchuk it fully addresses it?
@IvanYashchuk confirmed that this PR also Fixes #4689
On a different note, if I'm not mistaken, if the input matrix has NaNs, the current behaviour would be to execute both algorithms before failing with no error in the TORCH_CUSOLVER_CHECK. Since we never optimise for errors, the fact that we execute both algorithms before failing is better than synchronising and checking for nans before launching svd, but the error part should hopefully be improved. Would it be fair to say that gesvd just returns CUSOLVER_STATUS_INTERNAL_ERROR when there are NaNs in the input? If so, we could check for that and throw a better error message then.
NaN
TORCH_CUSOLVER_CHECK
svd
gesvd
CUSOLVER_STATUS_INTERNAL_ERROR
Also, should we document this behaviour? If so, how should we advertise this? cc @mruberry I am tempted to say that it's fine not to write this in the docs as it's more of an "implementation detail" and (at the moment) there's not much that the user can do about it. Even more, the user will be informed of this behaviour once it happens (if it happens).
I have only minor suggestions like using of full_matrices for clarity and using const references. I'd also like to see the clarification on view+copy in gesvd path for the VT tensor.
full_matrices
VT
Merge remote-tracking branch 'upstream/master' into svd_gpu_gevsd_fal…
41d74db
…lback
@lezcano, I agree, it's ok not to write in the docs, as you say, it's implementation detail, and we don't optimize for fast throwing of error.
Merging #64533 (eac2a4a) into master (88c0ea9) will decrease coverage by 0.07%. The diff coverage is n/a.
0.07%
n/a
@@ Coverage Diff @@ ## master #64533 +/- ## ========================================== - Coverage 66.73% 66.65% -0.08% ========================================== Files 710 714 +4 Lines 92435 92546 +111 ========================================== + Hits 61682 61685 +3 - Misses 30753 30861 +108
move gesvd fallback outside if-else
47d1499
temporarily skip svd nan test
2cad9c6
better formatted error message
5e116b9
vt_workspace explanations
a24db0a
b24a26c
linalg test?
ff63ebe
I'm not quite certain how to change this test. Cusolver gesvd reports CUSOLVER_STATUS_EXECUTION_FAILED when the matrix contains nan
CUSOLVER_STATUS_EXECUTION_FAILED
nan
cc @IvanYashchuk
Could we do something like this? #64818 (comment)
About the error, a cool way to handle this would be to do an input.in_nan().any() before parsing the error message in the function that parses the error
I don't think this is a good idea. First, Tensor.isnan() takes extra time. Besides that, if (Tensor.isnan().any()) will cause a device-host sync, which further complicates the performance.
Tensor.isnan()
if (Tensor.isnan().any())
Note that I propose doing this whenever we know that an error has occured, so we'll be pesimizing the code before throwing an exception.
Since we don't have a switch to switch the backend of gesvd, gesvdj, or gesvdjBatched, I can't find a good way to test this gesvd implementation in the CI.
gesvdjBatched
I tested this locally by replacing the apply_svd_lib_gesvdj below with apply_svd_lib_gesvd, and all linalg tests passed. Also tested numerical accuracy of various shapes with this script https://github.com/xwang233/code-snippet/blob/master/linalg/svd/svdprof.py.
apply_svd_lib_gesvdj
apply_svd_lib_gesvd
fix gesvd when m < n
eac2a4a
cc @mruberry @ngimel The PR is ready to go
@lezcano @IvanYashchuk do you have any additional comments on this PR?
I think that there's just this minor point standing: #64533 (comment) Otherwise this LGTM
6631fa4
Ruleset - Version: v1 Ruleset - File: https://github.com/xwang233/pytorch/blob/af60e5c15bfae21146bc1e346704ce3e265c91d9/.github/generated-ciflow-ruleset.json PR ciflow labels: ciflow/default,ciflow/all
v1
ciflow/default,ciflow/all
ciflow/all
ciflow/cpu
ciflow/linux
ciflow/cuda
ciflow/libtorch
ciflow/slow
ciflow/default
ciflow/noarch
ciflow/xla
ciflow/vulkan
ciflow/mobile
ciflow/sanitizers
ciflow/onnx
ciflow/bazel
ciflow/scheduled
ciflow/slow-gradcheck
ciflow/win
# ciflow rerun, "ciflow/default" will always be added automatically @pytorchbot ciflow rerun # ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun @pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow
For more information, please take a look at the CI Flow Wiki.
@mruberry, @ngimel could you please take a look and help merge this PR?
ROCm error is real https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.3.1-py3.6-test1/6123/console
b71fed3
test with rocm
af60e5c
@ngimel ROCm test has been fixed. All tests passed and the PR is ready to go.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
[CUDA][Linalg] Add gesvd as SVD fallback; optimize SVD gesvdj perform…
204ffd3
…ance (#64533) Summary: Fix #64237 Fix #28293 Fix #4689 See also #47953 cc ngimel jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: #64533 Reviewed By: albanD Differential Revision: D31915794 Pulled By: ngimel fbshipit-source-id: 29ea48696531ced8a48474e891a9e2d5f11e9d7a
test_svd_errors_and_warnings
Fix test_svd_errors_and_warnings warning message when cuda >= 11.5 (#…
2cd48d1
…68683) Summary: In SVD cusolverDnXgesvd computations, When cuda < 11.5, cusolver raises CUSOLVER_STATUS_EXECUTION_FAILED when input contains nan. When cuda >= 11.5, cusolver normally finishes execution and sets info array indicating convergence issue. Related: #68259 #64533 Pull Request resolved: #68683 Reviewed By: dagitses Differential Revision: D32583576 Pulled By: mruberry fbshipit-source-id: f732872522e0bda2703450ffcc64ae3a0d3f5bc0
enable evaluation of svd on gpu
c05e6c0
The issue with the evaluation of the svd on the gpu was fixed in pytorch/pytorch#64533.
torch.pinverse
lezcano lezcano approved these changes
ngimel ngimel approved these changes
IvanYashchuk IvanYashchuk approved these changes
nikitaved Awaiting requested review from nikitaved
Successfully merging this pull request may close these issues.