-
Notifications
You must be signed in to change notification settings - Fork 24.3k
Add cuSOLVER path for torch.linalg.qr #56256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr' --durations=5` cuts the runtime for these tests by 1 minute locally. Ref. #51552 [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 9cc5653 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr' --durations=5` cuts the runtime for these tests by 1 minute locally. Ref. pytorch#51552 ghstack-source-id: 4f0cbb7 Pull Request resolved: pytorch#56256
Time spent for running
MAGMA:
|
Here is MAGMA vs cuSOLVER comparison for non-batched square inputs for modes
MAGMA is only faster than cuSOLVER for large size inputs and |
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr' --durations=5` cuts the runtime for these tests by 1 minute locally. Ref. pytorch#51552 gh 8000 stack-source-id: 4f5361a Pull Request resolved: pytorch#56256
@@ -1777,7 +1777,7 @@ void linalg_qr_out_helper(const Tensor& input, const Tensor& Q, const Tensor& R, | |||
orgqr_stub(input.device().type(), const_cast<Tensor&>(Q), tau); | |||
} | |||
|
|||
std::tuple<Tensor, Tensor> _linalg_qr_helper_cpu(const Tensor& input, std::string mode) { | |||
std::tuple<Tensor, Tensor> _linalg_qr_helper 8000 _default(const Tensor& input, std::string mode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why "default" and not "cpu"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have now linalg_qr_helper_magma
that uses MAGMA for the QR decomposition, it can't be implemented using geqrf_stub
+ orgqr_stub
, because orgqr_stub
only supports cuSOLVER for CUDA inputs. In addition, MAGMA doesn't follow LAPACK API for geqrf
and orgqr
operations that together form the QR decomposition. That's why we need to have a separate function for MAGMA.
And we have _linalg_qr_helper_default
with "_default" and not "_cpu" because this function supports both CPU and CUDA inputs, for CUDA inputs cuSOLVER&cuBLAS is used.
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr' --durations=5` cuts the runtime for these tests by 1 minute locally. See #56256 (comment). Performance comparison: #56256 (comment). Differential Revision: [D27960154](https://our.internmc.facebook.com/intern/diff/D27960154) [ghstack-poisoned]
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr' --durations=5` cuts the runtime for these tests by 1 minute locally. See #56256 (comment). Performance comparison: #56256 (comment). Differential Revision: [D27960154](https://our.internmc.facebook.com/intern/diff/D27960154) [ghstack-poisoned]
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr' --durations=5` cuts the runtime for these tests by 1 minute locally. See #56256 (comment). Performance comparison: #56256 (comment). Differential Revision: [D27960154](https://our.internmc.facebook.com/intern/diff/D27960154) [ghstack-poisoned]
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr' --durations=5` cuts the runtime for these tests by 1 minute locally. See #56256 (comment). Performance comparison: #56256 (comment). Differential Revision: [D27960154](https://our.internmc.facebook.com/intern/diff/D27960154) [ghstack-poisoned]
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr' --durations=5` cuts the runtime for these tests by 1 minute locally. Ref. pytorch#51552 ghstack-source-id: 2f98cde Pull Request resolved: pytorch#56256
Time to start landing the second part of this stack! @xwang233 would you take a look at this PR in the stack? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is very concise and LGTM.
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr' --durations=5` cuts the runtime for these tests by 1 minute locally. See #56256 (comment). Performance comparison: #56256 (comment). Differential Revision: [D27960154](https://our.internmc.facebook.com/intern/diff/D27960154) [ghstack-poisoned]
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr' --durations=5` cuts the runtime for these tests by 1 minute locally. Ref. pytorch#51552 ghstack-source-id: e94b357 Pull Request resolved: pytorch#56256
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr' --durations=5` cuts the runtime for these tests by 1 minute locally. See #56256 (comment). Performance comparison: #56256 (comment). Differential Revision: [D27960154](https://our.internmc.facebook.com/intern/diff/D27960154) [ghstack-poisoned]
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr' --durations=5` cuts the runtime for these tests by 1 minute locally. See #56256 (comment). Performance comparison: #56256 (comment). Differential Revision: [D27960154](https://our.internmc.facebook.com/intern/diff/D27960154) [ghstack-poisoned]
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr' --durations=5` cuts the runtime for these tests by 1 minute locally. See #56256 (comment). Performance comparison: #56256 (comment). Differential Revision: [D27960154](https://our.internmc.facebook.com/intern/diff/D27960154) [ghstack-poisoned]
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr' --durations=5` cuts the runtime for these tests by 1 minute locally. Ref. pytorch#51552 ghstack-source-id: 574f15d Pull Request resolved: pytorch#56256
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stamped
Summary: Pull Request resolved: pytorch#56256 Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr' --durations=5` cuts the runtime for these tests by 1 minute locally. See pytorch#56256 (comment). Performance comparison: pytorch#56256 (comment). Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27960154 Pulled By: mruberry fbshipit-source-id: 5312330d82337dec2856ec5527156a3a547a0b50
Stack from ghstack:
Using cuSOLVER path with
pytest test/test_ops.py -k 'linalg_qr' --durations=5
cuts the runtime for these tests by 1 minute locally. See #56256 (comment).Performance comparison: #56256 (comment).
Differential Revision: D27960154