[CUDA][Linalg] Add gesvd as SVD fallback; optimize SVD gesvdj performance #64533

xwang233 · 2021-09-06T00:40:30Z

Fix #64237
Fix #28293
Fix #4689

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/64533
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit af60e5c (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

lezcano

Thank you for the PR Xiao! Left a few comments.
Not that this PR also
Fixes #28293

lezcano · 2021-09-06T07:58:08Z

aten/src/ATen/native/cuda/BatchLinearAlgebraLib.cpp

+  // cusolver gesvd only supports m >= n, we transpose the whole svd calculation if m < n
+  // i.e.: A = U S V^H  --> A^T = V^H^T S U^T
+  if (m < n) {
+    apply_svd_lib_gesvd(self.transpose(-2, -1), VT, S, U, infos, compute_uv, some, calculate_all_batches, batches);


Very slick!
If you want, via this trick you could implement #9083 for gesvdj which would save a copy in many cases.

aten/src/ATen/native/cuda/BatchLinearAlgebraLib.cpp

lezcano · 2021-09-06T08:21:24Z

This PR also partially addresses #4689 . Perhaps even with the previous work from @IvanYashchuk it fully addresses it?

lezcano · 2021-09-06T08:30:17Z

@IvanYashchuk confirmed that this PR also
Fixes #4689

On a different note, if I'm not mistaken, if the input matrix has NaNs, the current behaviour would be to execute both algorithms before failing with no error in the TORCH_CUSOLVER_CHECK. Since we never optimise for errors, the fact that we execute both algorithms before failing is better than synchronising and checking for nans before launching svd, but the error part should hopefully be improved. Would it be fair to say that gesvd just returns CUSOLVER_STATUS_INTERNAL_ERROR when there are NaNs in the input? If so, we could check for that and throw a better error message then.

lezcano · 2021-09-06T08:36:46Z

Also, should we document this behaviour? If so, how should we advertise this? cc @mruberry
I am tempted to say that it's fine not to write this in the docs as it's more of an "implementation detail" and (at the moment) there's not much that the user can do about it. Even more, the user will be informed of this behaviour once it happens (if it happens).

IvanYashchuk

I have only minor suggestions like using of full_matrices for clarity and using const references. I'd also like to see the clarification on view+copy in gesvd path for the VT tensor.

aten/src/ATen/cuda/CUDASolver.cpp

aten/src/ATen/native/cuda/BatchLinearAlgebraLib.cpp

…lback

ngimel · 2021-09-08T00:20:11Z

Also, should we document this behaviour? If so, how should we advertise this? cc @mruberry
I am tempted to say that it's fine not to write this in the docs as it's more of an "implementation detail" and (at the moment) there's not much that the user can do about it. Even more, the user will be informed of this behaviour once it happens (if it happens).

@lezcano, I agree, it's ok not to write in the docs, as you say, it's implementation detail, and we don't optimize for fast throwing of error.

codecov · 2021-09-08T01:16:56Z

Codecov Report

Merging #64533 (eac2a4a) into master (88c0ea9) will decrease coverage by 0.07%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #64533      +/-   ##
==========================================
- Coverage   66.73%   66.65%   -0.08%     
==========================================
  Files         710      714       +4     
  Lines       92435    92546     +111     
==========================================
+ Hits        61682    61685       +3     
- Misses      30753    30861     +108

…lback

xwang233 · 2021-09-10T05:33:35Z

test/test_linalg.py

                svd(a)
+            error_msg = r'\(Batch element 1\): The algorithm failed to converge' if self.device_type == 'cpu' \
+                        else 'CUSOLVER_STATUS_EXECUTION_FAILED'


I'm not quite certain how to change this test. Cusolver gesvd reports CUSOLVER_STATUS_EXECUTION_FAILED when the matrix contains nan

cc @IvanYashchuk

Could we do something like this?
#64818 (comment)

About the error, a cool way to handle this would be to do an input.in_nan().any() before parsing the error message in the function that parses the error

I don't think this is a good idea. First, Tensor.isnan() takes extra time. Besides that, if (Tensor.isnan().any()) will cause a device-host sync, which further complicates the performance.

Note that I propose doing this whenever we know that an error has occured, so we'll be pesimizing the code before throwing an exception.

xwang233 · 2021-09-10T05:36:37Z

aten/src/ATen/native/cuda/BatchLinearAlgebraLib.cpp

@@ -490,6 +669,8 @@ std::tuple<Tensor, Tensor, Tensor> _svd_helper_cuda_lib(const Tensor& self, bool
    VT_working_copy.transpose_(-2, -1);
  }

+  // VT_working_copy is row-major, actually "V" now
+
  // heuristic for using `gesvdjBatched` over `gesvdj`
  if (m <= 32 && n <= 32 && batch_size > 1 && (!some || m == n)) {
    apply_svd_lib_gesvdjBatched(self, U_working_copy, S_working_copy, VT_working_copy, infos, compute_uv);


Since we don't have a switch to switch the backend of gesvd, gesvdj, or gesvdjBatched, I can't find a good way to test this gesvd implementation in the CI.

I tested this locally by replacing the apply_svd_lib_gesvdj below with apply_svd_lib_gesvd, and all linalg tests passed. Also tested numerical accuracy of various shapes with this script https://github.com/xwang233/code-snippet/blob/master/linalg/svd/svdprof.py.

xwang233 · 2021-09-13T20:31:52Z

cc @mruberry @ngimel The PR is ready to go

ngimel · 2021-09-29T21:40:56Z

@lezcano @IvanYashchuk do you have any additional comments on this PR?

lezcano

I think that there's just this minor point standing: #64533 (comment)
Otherwise this LGTM

…lback

pytorch-probot · 2021-10-21T23:29:34Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/xwang233/pytorch/blob/af60e5c15bfae21146bc1e346704ce3e265c91d9/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default,ciflow/all

Workflows	Labels (bold enabled)	Status
Triggered Workflows
caffe2-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	✅ triggered
libtorch-linux-xenial-cuda10.2-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	✅ triggered
libtorch-linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	✅ triggered
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`	✅ triggered
linux-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/xla`	✅ triggered
linux-vulkan-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda10.2-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`	✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3-clang5-mobile-code-analysis	`ciflow/all`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-dynamic	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3.6-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`	✅ triggered
linux-xenial-py3.6-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`	✅ triggered
linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
parallelnative-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	✅ triggered
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	✅ triggered
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	✅ triggered
periodic-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	✅ triggered
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/win`	✅ triggered
Skipped Workflows

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

IvanYashchuk · 2021-10-22T09:11:45Z

@mruberry, @ngimel could you please take a look and help merge this PR?

ngimel · 2021-10-22T19:12:07Z

ROCm error is real https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.3.1-py3.6-test1/6123/console

…lback

xwang233 · 2021-10-26T00:39:24Z

@ngimel ROCm test has been fixed. All tests passed and the PR is ready to go.

facebook-github-bot · 2021-10-26T00:53:42Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…ance (#64533) Summary: Fix #64237 Fix #28293 Fix #4689 See also #47953 cc ngimel jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: #64533 Reviewed By: albanD Differential Revision: D31915794 Pulled By: ngimel fbshipit-source-id: 29ea48696531ced8a48474e891a9e2d5f11e9d7a

…68683) Summary: In SVD cusolverDnXgesvd computations, When cuda < 11.5, cusolver raises CUSOLVER_STATUS_EXECUTION_FAILED when input contains nan. When cuda >= 11.5, cusolver normally finishes execution and sets info array indicating convergence issue. Related: #68259 #64533 Pull Request resolved: #68683 Reviewed By: dagitses Differential Revision: D32583576 Pulled By: mruberry fbshipit-source-id: f732872522e0bda2703450ffcc64ae3a0d3f5bc0

The issue with the evaluation of the svd on the gpu was fixed in pytorch/pytorch#64533.

xwang233 added 4 commits September 4, 2021 16:40

gesvd init

4c3d083

svd uses gesvdj as default

1645440

split gesvdj buffersize

fa894ea

split gesvd buffersize

9920276

xwang233 requested review from IvanYashchuk, lezcano and nikitaved as code owners September 6, 2021 00:40

zero numel shortcut

5a1142b

facebook-github-bot added the cla signed label Sep 6, 2021

xwang233 added ci/all and removed ci/all labels Sep 6, 2021

pytorchbot added the open source label Sep 6, 2021

lezcano reviewed Sep 6, 2021

View reviewed changes

mruberry requested a review from ngimel September 7, 2021 03:33

IvanYashchuk reviewed Sep 7, 2021

View reviewed changes

Merge remote-tracking branch 'upstream/master' into svd_gpu_gevsd_fal…

41d74db

…lback

xwang233 added 4 commits September 9, 2021 17:39

move gesvd fallback outside if-else

47d1499

temporarily skip svd nan test

2cad9c6

better formatted error message

5e116b9

vt_workspace explanations

a24db0a

xwang233 added 2 commits September 9, 2021 19:24

Merge remote-tracking branch 'upstream/master' into svd_gpu_gevsd_fal…

b24a26c

…lback

linalg test?

ff63ebe

xwang233 commented Sep 10, 2021

View reviewed changes

xwang233 marked this pull request as draft September 10, 2021 05:50

xwang233 marked this pull request as ready for review September 10, 2021 07:36

fix gesvd when m < n

eac2a4a

lezcano approved these changes Sep 30, 2021

View reviewed changes

IvanYashchuk approved these changes Sep 30, 2021

View reviewed changes

Merge remote-tracking branch 'upstream/master' into svd_gpu_gevsd_fal…

6631fa4

…lback

pytorch-probot bot added the ciflow/default label Oct 21, 2021

xwang233 added the ciflow/all label Oct 22, 2021

xwang233 added 2 commits October 22, 2021 18:37

Merge remote-tracking branch 'upstream/master' into svd_gpu_gevsd_fal…

b71fed3

…lback

test with rocm

af60e5c

ngimel approved these changes Oct 26, 2021

View reviewed changes

facebook-github-bot closed this Oct 26, 2021

xwang233 mentioned this pull request Nov 20, 2021

Fix test_svd_errors_and_warnings warning message when cuda >= 11.5 #68683

Closed

zenoone added a commit to zenoone/deepqmc that referenced this pull request Jan 10, 2022

enable evaluation of svd on gpu

c05e6c0

The issue with the evaluation of the svd on the gpu was fixed in pytorch/pytorch#64533.

IvanYashchuk mentioned this pull request Apr 8, 2022

torch.pinverse on CUDA tensors produces non-optimal output for certain type of (invertible) matrix on torch > 1.7.1 but not on torch <= 1.7.1 #75494

Open

min-jean-cho mentioned this pull request Feb 16, 2023

Bug in torch.linalg.svd #93275

Open

[CUDA][Linalg] Add gesvd as SVD fallback; optimize SVD gesvdj performance #64533

[CUDA][Linalg] Add gesvd as SVD fallback; optimize SVD gesvdj performance #64533

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

⚛️ CI Flow

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!