Fix the non-conv 8000 erging issue of SVD on GPU for large matrices #64237

xwang233 · 2021-08-31T01:05:09Z

🚀 Feature

SVD on GPU for a large matrix (usually with size > 1024) or an ill-conditioned matrix may throw runtime error because of not converging well.

SVD on GPU currently uses the iterative Jacobi method of cusolverDn<T>gesvdj (note the trailing j). The pytorch implementation takes cusolver default values of max sweeps = 100 and tolerance = machine accuracy. The cusolver gesvdj engine stops when either the accuracy or max sweeps is achieved. From the cusolver doc, usually ~15 sweeps would be enough.

The gesvdj method works well and is much faster than the QR-based method of cusolverDn<T>gesvd for small size matrix (benchmark #48436 (comment)). However, for large size matrix, we've received several reports with runtime errors because SVD is not converged well, e.g. #28293 (comment)

To fix this issue, there are several approaches

Expose the max sweeps and tolerance of cusolver gesvdj to user, so that people can set them to stricter values as needed.
Problem: what should the user interface look like? Extra kwargs for torch.linalg.svd? Note that the CPU LAPACK method doesn't have this issue and won't use those parameters.
Add gesvd method to pytorch.
Problem: what should the user interface look like? The entrance of SVD is only torch.linalg.svd and how should user change the backend engine? Extra kwargs or extra python functions?
Add gesvd method to pytorch, and use gesvd as a fallback when gesvdj doesn't converge.
Problem: this would make SVD performance much worse in those cases.

Note that the QR-based gesvd method always converges. This issue doesn't affect the batched gesvdjBatched method which only takes matrices with size <= 32 and converges well in nearly all cases.

cc @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr @IvanYashchuk @xwang233 @lezcano @ptrblck

The text was updated successfully, but these errors were encountered:

lezcano · 2021-08-31T09:52:17Z

In my opinion, the third option is the best one.

I used to face this problem very rarely (say while training a network once every 10 epochs) and it would make the training fail, which was very annoying. As such, if we just launch gesvd as a fallback, the amortised cost will be close to zero, and all this would be completely transparent to the user.

Note that the solution I used to do (and one they propose in the mentioned post) is to just have a try-catch, and in the catch, one perturbs the original matrix with some noise and computes the SVD of that one. This effectively also incurs in computing two SVDs, but, again, this error is so infrequent that it's fine.

lezcano · 2021-08-31T10:17:23Z

There is a similar problem with eigh #59720 (and I suspect that also with eig), so we should see if we have similar fallback algorithms for these other two. I'd assume that there is one for eigh, but eig is infamous for not being the most stable of functions...

lezcano · 2021-08-31T10:26:57Z

This is a similar one, but it fails silently: #24466
For this second one we should probably just mention in the docs that working with floating precision may not be enough for these algorithms when the matrices are ill-conditioned.

mruberry · 2021-08-31T16:42:56Z

Per offline discussion:

the fallback sounds great
a separate issue to add a "driver"-like argument (like lstsq has) would be interesting (we should consider giving them meaningful names and their algorithmic names, too)
- alternatively the keyword might be something like "exact"
we don't think we should add the more precise CUDA-specific kwargs to linalg.svd at this time (but maybe a "lower level" function in the future could expose them?)
- structure work may be able to expose these options in the future, too
the fallback to the alternative "more precise" algorithm must be carefully documented because users might complain about us being different from other libraries and confused about the performance of linalg.svd
- Let's add a warning when we fallback? SciPy has "no error" flags that suppress these warnings
- TORCH_WARN_ONCE might be a happy compromise

o-alexandre-felipe · 2021-09-14T06:37:59Z

Do you think a parallel QR iteration for tridiagonal systems would help any of this?
As far as I know the method used for large matrices is the divide and conquer, in which sub matrices are diagonalized and then, combined with a rank-1 perturbation.
I have read that the QR iteration method is still the best in a serial computer.
I framed the sub diagonal elimination step of the QR iteration as an associative operation (in an augmented space).
If the serial elimination of the sub diagonal has complexity c1 * N, in a serial computer, this will have a complexity c2 * log(N) in a fully parallel computer, or c2 * N * log(N) / P, when we have P processors.
I don't know if it is worth implementing it or not :D

…ance (#64533) Summary: Fix #64237 Fix #28293 Fix #4689 See also #47953 cc ngimel jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: #64533 Reviewed By: albanD Differential Revision: D31915794 Pulled By: ngimel fbshipit-source-id: 29ea48696531ced8a48474e891a9e2d5f11e9d7a

xwang233 added the module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul label Aug 31, 2021

zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 1, 2021

xwang233 self-assigned this Sep 3, 2021

xwang233 mentioned this issue Sep 6, 2021

[CUDA][Linalg] Add gesvd as SVD fallback; optimize SVD gesvdj performance #64533

Closed

facebook-github-bot closed this as completed Oct 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix the non-conv 8000 erging issue of SVD on GPU for large matrices #64237

Fix the non-converging issue of SVD on GPU for large matrices #64237

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix the non-conv 8000 erging issue of SVD on GPU for large matrices #64237

Fix the non-converging issue of SVD on GPU for large matrices #64237

Comments

Uh oh!

🚀 Feature

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!