8000 [CUDA][Linalg] Add gesvd as SVD fallback; optimize SVD gesvdj performance by xwang233 · Pull Request #64533 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[CUDA][Linalg] Add gesvd as SVD fallback; optimize SVD gesvdj performance #64533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 19 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
linalg test?
  • Loading branch information
xwang233 committed Sep 10, 2021
commit ff63ebe6208947a96da9f5a8c8f4099326f61a50
22 changes: 13 additions & 9 deletions test/test_linalg.py
Original file line number Diff line number Diff line change
Expand Up @@ -2896,15 +2896,19 @@ def test_svd_errors_and_warnings(self, device, dtype):
# error from out_v
svd(a, out=(out_u, out_s, out_v))

# # if input contains NaN then an error is triggered for svd
# a = torch.full((3, 3), float('nan'), dtype=dtype, device=device)
# a[0] = float('nan')
# with self.assertRaisesRegex(RuntimeError, "The algorithm failed to converge"):
# svd(a)
# a = torch.randn(3, 33, 33, dtype=dtype, device=device)
# a[1, 0, 0] = float('nan')
# with self.assertRaisesRegex(RuntimeError, r"\(Batch element 1\): The algorithm failed to converge"):
# svd(a)
# if input contains NaN then an error is triggered for svd
error_msg = 'The algorithm failed to converge' if self.device_type == 'cpu' \
else 'CUSOLVER_STATUS_EXECUTION_FAILED'
a = torch.full((3, 3), float('nan'), dtype=dtype, device=device)
a[0] = float('nan')
with self.assertRaisesRegex(RuntimeError, error_msg):
svd(a)
error_msg = r'\(Batch element 1\): The algorithm failed to converge' if self.device_type == 'cpu' \
else 'CUSOLVER_STATUS_EXECUTION_FAILED'
Copy link
Collaborator Author
@xwang233 xwang233 Sep 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite certain how to change this test. Cusolver gesvd reports CUSOLVER_STATUS_EXECUTION_FAILED when the matrix contains nan

cc @IvanYashchuk

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do something like this?
#64818 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the error, a cool way to handle this would be to do an input.in_nan().any() before parsing the error message in the function that parses the error

I don't think this is a good idea. First, Tensor.isnan() takes extra time. Besides that, if (Tensor.isnan().any()) will cause a device-host sync, which further complicates the performance.

< 7B50 span class="Button-label"> Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I propose doing this whenever we know that an error has occured, so we'll be pesimizing the code before throwing an exception.

a = torch.randn(3, 33, 33, dtype=dtype, device=device)
a[1, 0, 0] = float('nan')
with self.assertRaisesRegex(RuntimeError, error_msg):
svd(a)

@skipCUDAIfNoMagmaAndNoCusolver
@skipCPUIfNoLapack
Expand Down
0