-
Notifications
You must be signed in to change notification settings - Fork 24.3k
Add cusolver gesvdj and gesvdjBatched to the backend of torch.svd #48436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
625a32f
3bc7a6e
071e8ab
8ca446d
86065fc
663b6f9
99a5e0c
bd48b35
692d67e
76bf235
a622eed
927d569
7d31052
9c6c800
5bb94a5
8056f59
29f5c7d
8cbd708
8a85768
0fb0af9
46a6027
9c08a67
9249a56
4446145
8b8c1d8
26f1e36
72edde6
7074ba8
445ba77
549738b
e6d9f09
8860cfa
478798b
7956abd
b57310a
d06fbe2
3bb3a88
a3a45b1
5bb8a9c
12bb8f3
c298256
7274295
ff2133e
932021b
697dca5
2f45e43
1c8cbd7
eedb710
ca2ddd8
ffec626
7cdb8e6
813ca13
00d260e
60b07f7
7d8f3ad
bb31bf9
f2efbaf
66ebe3b
a99df7d
3075adc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -239,7 +239,14 @@ static inline std::tuple<std::vector<int64_t>, | |
} | ||
|
||
// Function to generate empty tensors of required size, strides and dtype for the SVD operation | ||
static inline std::tuple<Tensor, Tensor, Tensor> _create_U_S_VT(const Tensor& input, bool some, bool compute_uv) { | ||
static inline std::tuple<Tensor, Tensor, Tensor> _create_U_S_VT(const Tensor& input, bool some, bool compute_uv, | ||
const bool svd_use_cusolver=false) { | ||
|
||
// U, S, VT are initialized as empty tensors. | ||
// For CPU LAPACK and GPU MAGMA backend, the tensors are initialized on CPU. | ||
// For GPU cuSOLVER backend, the tensors are initialized on GPU. | ||
const auto usvt_device = svd_use_cusolver ? at::kCUDA : at::kCPU; | ||
|
||
auto sizes = input.sizes().vec(); | ||
int64_t m = input.size(-2), n = input.size(-1); | ||
|
||
|
@@ -251,47 +258,21 @@ static inline std::tuple<Tensor, Tensor, Tensor> _create_U_S_VT(const Tensor& in | |
strides[input.dim() - 1] = m; | ||
strides[input.dim() - 2] = 1; | ||
|
||
Tensor U_empty; | ||
if (!input.is_cuda()) { | ||
U_empty = at::empty_strided(sizes, strides, input.options()); | ||
} else { | ||
// NB: U_empty is an empty tensor created on the CPU intentionally, because magma_(d/s)gesdd | ||
// (which is the driver routine for the divide and conquer SVD operation) | ||
// takes in arrays on the CPU as input. This routine is a hybrid CPU-GPU routine that | ||
// moves the inputs between devices internally. | ||
U_empty = at::empty_strided(sizes, strides, input.options().device(at::kCPU)); | ||
} | ||
Tensor U_empty = at::empty_strided(sizes, strides, input.options().device(usvt_device)); | ||
U_empty.zero_(); | ||
mruberry marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
// VT should be a column-major or a batch of column-major matrices | ||
sizes[input.dim() - 2] = n; | ||
sizes[input.dim() - 1] = n; | ||
strides = at::detail::defaultStrides(sizes); | ||
strides[input.dim() - 1] = n; | ||
strides[input.dim() - 2] = 1; | ||
Tensor VT_empty; | ||
if (!input.is_cuda()) { | ||
VT_empty = at::empty_strided(sizes, strides, input.options()); | ||
} else { | ||
10000 // NB: VT_empty is an empty tensor created on the CPU intentionally, because magma_(d/s)gesdd | ||
// (which is the driver routine for the divide and conquer SVD operation) | ||
// takes in arrays on the CPU as input. This routine is a hybrid CPU-GPU routine that | ||
// moves the inputs between devices internally. | ||
VT_empty = at::empty_strided(sizes, strides, input.options().device(at::kCPU)); | ||
} | ||
// VT should be a column-major or a batch of column-major matrices | ||
Tensor VT_empty = at::zeros(sizes, input.options().device(usvt_device)); | ||
VT_empty.transpose_(-2, -1); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that the code is correct but the comment is wrong. Moreover, it contradicts the comment at line 264, which says |
||
|
||
sizes.pop_back(); | ||
sizes[input.dim() - 2] = std::min(m, n); | ||
Tensor S_empty; | ||
ScalarType dtype = toValueType(typeMetaToScalarType(input.dtype())); | ||
if (!input.is_cuda()) { | ||
S_empty = at::empty(sizes, input.options().dtype(dtype)); | ||
} else { | ||
// NB: S_empty is an empty tensor created on the CPU intentionally, because magma_(d/s)gesdd | ||
// (which is the driver routine for the divide and conquer SVD operation) | ||
// takes in arrays on the CPU as input. This routine is a hybrid CPU-GPU routine that | ||
// moves the inputs between devices internally. | ||
S_empty = at::empty(sizes, input.options().dtype(dtype).device(at::kCPU)); | ||
} | ||
Tensor S_empty = at::empty(sizes, input.options().dtype(dtype).device(usvt_device)); | ||
xwang233 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
return std::tuple<Tensor, Tensor, Tensor>(U_empty, S_empty, VT_empty); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this comment was useful to understand why we want to allocate these tensors on the CPU in case we use magma (or at least, it was helpful to me when I looked at this code for the first time), so maybe it is worth resurrecting and put it close to
usvt_device = ...
above