-
Notifications
You must be signed in to change notification settings - Fork 24.3k
Batched SVD using cuSolver #14175
New 8000 issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this seems like a pretty good thing to add. @vishwakftw can you mentor @jjbouza to get this done? |
Hello @vishwakftw, thanks for the help. The easy solution is to fall back to CPU for matrices larger than 32x32. There might be some other options though. For example, using CUDA streams I think we can parallelize the regular (non-batch) SVD operation provided by cuSolver. AFAIK this is what was done before cuSolver provided the batch mode SVD natively. For an example of this see e.g. see the first answer here. I would need to test and benchmark this approach though. |
I think it wouldn't be wise to paralyze users by asking them to use CPU for matrices larger than 32x32, and I am instead advocating for a uniform API use across CPU and GPU. Your idea utilizing |
Agreed, using MAGMA+for loop on GPU would work. I can benchmark the cuSolver CUDA streams approach against this. First I'm going to implement the batched cuSolver routine for batch size <= 32x32. I'll model your batched inverse implementation to do this, so I'll let you know if I have any questions. |
Great, that's sounds good to me! Please feel free to ping here or on Slack :) |
Hey @vishwakftw, like you said, it doesn't appear that the code base has any support for cuSolver. To add this I'm going to need to generate the cuSolver CUDA handles and pass them around to the cuSolver calls. I've been looking through the cuBLAS code to get an idea for how to do this, so I just want to make sure I've got it straight. Heres the idea: Add a Add a Does this look like the right pattern for cuSolver integration? Thanks |
Hey @jjbouza . This path sounds good to me! |
One quick comment: if possible, try not to add new things to THC, but instead do them directly in ATen |
@jjbouza any updates? |
@vishwakftw Have been slowly making progress. I should have something soon |
Would the batched SVD still be differentiable? |
Yes, it should be. It is effectively batch_count number of SVDs performed at once; I don't see why it shouldn't be differentiable. Of course, there is always the edge case comprising of ill-conditioned matrices. |
this would be great to have! |
cc: @jjbouza. |
Hey, guys. I've implemented a batch version of SVD by cuSolver as an individual package, including Torch-batch-svd: https://github.com/KinglittleQ/torch-batch-svd |
This is awesome! Thanks for doing this. For inputs of size greater than 32 x 32, we can probably perform the SVD computation in a loop. For CPU, this has to be performed in a loop anyways, so I think this is a pretty good implementation! Do you mind sending in a PR for this? I'd be happy to clarify any questions that you may have about porting this is PyTorch / ATen. |
I can try to port it into Pytorch if you could tell me where should I add the forward and backward function. But I'm not sure I have time to complete it as there's other work for me to do now. |
Is it fine if I complete the port on your behalf? |
Of course, it's OK. |
Thank you, I'll ping you when I send in the pull request. |
@vishwakftw Hi, can you tell me if there's a resolution to this thread? |
Hi @SaiK95. Sorry I haven’t been able to get to this in the past few months due to college. I’ll get to it in the summer, within a month. Currently, the best and only possible way to do it would be to run a loop. |
@vishwakftw No problem, thanks for the quick reply! |
Closing this since the feature is available on master. The current implementation uses sequential MAGMA calls in a for-loop. |
It seems there are several people working on batch mode linear algebra routines, i.e. #11796 and #14071 are active.
Any plans for adding a batch mode SVD? This would be useful for certain implementations of group equivariant networks.
I'm not completely familiar with the PyTorch codebase, but if I'm not mistaken the usual backend used for linear algebra computations on the GPU is MAGMA. I don't think MAGMA implements a batch SVD operation, but cuSolver does for small matrices (max 32x32). For larger matrices we can just fall back to the current approach.
If no one else is planning on working on this I can take a look at it. The correct way to do this would be to model something like #9949, right?
I realize several others have made similar suggestions: #10172, #4689. Those issues don't seem active however.
The text was updated successfully, but these errors were encountered: