-
Notifications
You must be signed in to change notification settings - Fork 24.3k
Add relative and absolute tolerances for matrix_rank, pinv #54151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Things are probably going to be worse for fp16 or other short bitwidth types #35666. So at least thresholds/epsilons should be configurable... |
Just to make sure I understand this issue:
Does that sound right, @IvanYashchuk? |
That's a great summary! All the points are correct. |
If default cut-offs are for fp32 and do not adapt automagically to fp16 inputs, then probably docs should have suggestions for cut-offs for fp16 inputs and clearly indicate that it's recommended that the user calls the function with the cut-off corresponding to dtype |
fp16 inputs are not supported for CPU and CUDA and I doubt they will ever be in near future. If it was supported it wouldn't be an issue: |
This sounds like a good UX change to me. We'll have to be a little careful in the docs how we promote atol/rtol over the historic arguments and how we clarify the default behavior. I also thinking changing the default cutoff value for these operations is OK if it simplifies the UX. |
Summary of offline discussion. All this applies to
|
The current default values for tolerances could be too conservative for larger random Gaussian matrices of floating type, for example: In [25]: a = torch.rand(1, 1, 1024, 1024).cuda()
In [26]: a[..., -1, :] = 0
In [27]: a[..., :, -1] = 0
In [28]: torch.linalg.matrix_rank(a)
Out[28]: tensor([[1019]], device='cuda:0')
In [29]: a.svd()[1].abs().topk(k=5, largest=False)
Out[29]:
torch.return_types.topk(
values=tensor([[[0.0000, 0.0011, 0.0157, 0.0202, 0.0401]]], device='cuda:0'),
indices=tensor([[[ Given how important Gaussian matrices are in the context of Machine Learning, looks like |
The default relative tolerance for
This is not the case according to the documentation:
|
The current default tolerance for The default values for About using SVD, well, matrix_rank uses As a side note, I don't think that a matrix in R^{1024 x 1024} distributed as U(0,1) will have a singular value close to zero, although it may be close to singular looking at its determinant :) |
I agree, fixed to Another example: In [4]: a = torch.rand(1024, 1024, device='cuda')
In [5]: torch.linalg.matrix_rank(a)
Out[5]: tensor(1020, device='cuda:0')
In [6]: a.svd()[1].abs().topk(k=5, largest=False)
Out[6]:
torch.return_types.topk(
values=tensor([0.0040, 0.0126, 0.0253, 0.0422, 0.0651], device='cuda:0'),
indices=tensor([1023, 1022, 1021, 1020, 1019], device='cuda:0')) |
Since it uses SVD, I might skip on trying to understand what these atol/rtol are and just implement my own custom thresholding function such in a potentially non-bc breaking way. Some LAPACK methods, by the way, allow to pass a custom thresholding function for singular values. |
For While the behaviour is certainly not consistent with the one this PR proposes as one multiplies the |
@lezcano, what is the correct result? The current default behavior for |
Summary: This pull request introduces new keyword arguments for `torch.linalg.matrix_rank` and `torch.linalg.pinv`: `atol` and `rtol`. Currently, only tensor overload has default values for either `atol` or `rtol`, the float overload requires both arguments to be specified. FC compatibility: #63102 (comment) Fixes #54151. Fixes #66618. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: #63102 Reviewed By: H-Huang Differential Revision: D31641456 Pulled By: mruberry fbshipit-source-id: 4c765508ab1657730703e42975fc8c0d0a60eb7c
Uh oh!
There was an error while loading. Please reload this page.
🚀 Feature
Both
torch.linalg.matrix_rank
andtorch.linalg.pinv
calculate singular values of the provided matrix and truncate them based on the specified tolerance (argument is calledrcond
fortorch.linalg.pinv
andtol
fortorch.linalg.matrix_rank
).Currently implemented behavior for setting the tolerance and the default tolerance values follow NumPy.
However, NumPy is not consistent in the default values and in treating the provided tolerance as relative or absolute.
for the argument
tolerance
(if default)
(if specified)
The proposal is to implement a unified way to specify the absolute or relative tolerances for the truncation of singular values as following:
Possible choices of
default_rtol
:eps * max(rows, cols)
formatrix_rank
and1e-15
forpinv
matrix_rank
but10 * eps * max(rows, cols)
forpinv
eps * min(rows, cols)
both forpinv
andmatrix_rank
Use of
max(atol, rtol * ...)
for defining the truncation criteria follows math.isclose.Backwards compatibility / NumPy compatibility:
cc @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr @IvanYashchuk @rgommers
The text was updated successfully, but these errors were encountered: