Batched SVD using cuSolver #14175

jjbouza · 2018-11-19T03:22:00Z

It seems there are several people working on batch mode linear algebra routines, i.e. #11796 and #14071 are active.

Any plans for adding a batch mode SVD? This would be useful for certain implementations of group equivariant networks.

I'm not completely familiar with the PyTorch codebase, but if I'm not mistaken the usual backend used for linear algebra computations on the GPU is MAGMA. I don't think MAGMA implements a batch SVD operation, but cuSolver does for small matrices (max 32x32). For larger matrices we can just fall back to the current approach.

If no one else is planning on working on this I can take a look at it. The correct way to do this would be to model something like #9949, right?

I realize several others have made similar suggestions: #10172, #4689. Those issues don't seem active however.

soumith · 2018-11-19T18:28:04Z

this seems like a pretty good thing to add. @vishwakftw can you mentor @jjbouza to get this done?

vishwakftw · 2018-11-20T01:03:17Z

@soumith sure I would be happy to mentor @jjbouza . One question, how will we achieve batched SVD on GPU for matrices larger than 32x32?

jjbouza · 2018-11-20T02:36:07Z

Hello @vishwakftw, thanks for the help.

The easy solution is to fall back to CPU for matrices larger than 32x32.

There might be some other options though. For example, using CUDA streams I think we can parallelize the regular (non-batch) SVD operation provided by cuSolver. AFAIK this is what was done before cuSolver provided the batch mode SVD natively. For an example of this see e.g. see the first answer here.

I would need to test and benchmark this approach though.

vishwakftw · 2018-11-20T03:13:16Z

I think it wouldn't be wise to paralyze users by asking them to use CPU for matrices larger than 32x32, and I am instead advocating for a uniform API use across CPU and GPU.

Your idea utilizing cuSolver is a good one. On a related note, I think we can use MAGMA for SVD for batches with matrices larger than 32x32. The code would be very similar to LAPACK code for batching in a loop, which you should be able to find in aten/src/ATen/native/BatchLinearAlgebra.cpp. As far as the code base goes, I don't think cuSolver has been extensively used.

jjbouza · 2018-11-20T03:48:16Z

Agreed, using MAGMA+for loop on GPU would work. I can benchmark the cuSolver CUDA streams approach against this. First I'm going to implement the batched cuSolver routine for batch size <= 32x32. I'll model your batched inverse implementation to do this, so I'll let you know if I have any questions.

vishwakftw · 2018-11-20T03:49:35Z

Great, that's sounds good to me! Please feel free to ping here or on Slack :)

jjbouza · 2018-11-22T04:27:39Z

Hey @vishwakftw, like you said, it doesn't appear that the code base has any support for cuSolver. To add this I'm going to need to generate the cuSolver CUDA handles and pass them around to the cuSolver calls.

I've been looking through the cuBLAS code to get an idea for how to do this, so I just want to make sure I've got it straight. Heres the idea:

Add a THCState_getCurrentcuSolverHandle function to aten/src/THC/THCGeneral to generate or access the cuSolver handle for a THCState (this would be an analog to the THCState_getCurrentBlasHandle function)

Add a getCurrentCUDASolverHandle function to aten/src/ATen/cuda/CUDAContext that calls the above THCState_getCurrentcuSolverHandle (this would be an analog to the at::cuda::getCurrentCUDABlasHandle function)

Does this look like the right pattern for cuSolver integration? Thanks

vishwakftw · 2018-11-22T06:04:23Z

Hey @jjbouza . This path sounds good to me!

fmassa · 2018-11-22T21:12:39Z

One quick comment: if possible, try not to add new things to THC, but instead do them directly in ATen

vishwakftw · 2018-11-22T23:16:25Z

@jjbouza , just as @fmassa has suggested, please add the functionality to ATen instead of THC. This means you will have to move the caching logic from THC to ATen for the cuSolver handle. You can still use functions from THC in ATen, so this should not be an issue.

vishwakftw · 2018-12-25T03:43:39Z

@jjbouza any updates?

jjbouza · 2018-12-28T02:29:18Z

@vishwakftw Have been slowly making progress. I should have something soon

daniyar-niantic · 2019-01-23T16:53:37Z

Would the batched SVD still be differentiable?

vishwakftw · 2019-01-23T17:00:18Z

Yes, it should be. It is effectively batch_count number of SVDs performed at once; I don't see why it shouldn't be differentiable. Of course, there is always the edge case comprising of ill-conditioned matrices.

Balandat · 2019-01-25T22:38:40Z

this would be great to have!

vishwakftw · 2019-01-26T03:23:38Z

cc: @jjbouza.

KinglittleQ · 2019-02-19T15:25:27Z

Hey, guys. I've implemented a batch version of SVD by cuSolver as an individual package, including forward and backward function. It's not perfect now and only supports torch.CudaFloatTensor as input but it may be helpful.

Torch-batch-svd: https://github.com/KinglittleQ/torch-batch-svd

vishwakftw · 2019-02-19T15:38:54Z

This is awesome! Thanks for doing this. For inputs of size greater than 32 x 32, we can probably perform the SVD computation in a loop.

For CPU, this has to be performed in a loop anyways, so I think this is a pretty good implementation!

Do you mind sending in a PR for this? I'd be happy to clarify any questions that you may have about porting this is PyTorch / ATen.

KinglittleQ · 2019-02-20T04:49:20Z

I can try to port it into Pytorch if you could tell me where should I add the forward and backward function. But I'm not sure I have time to complete it as there's other work for me to do now.

vishwakftw · 2019-02-20T04:51:15Z

Is it fine if I complete the port on your behalf?

KinglittleQ · 2019-02-20T04:53:20Z

Of course, it's OK.

vishwakftw · 2019-02-20T04:54:34Z

Thank you, I'll ping you when I send in the pull request.

SaiK95 · 2019-05-14T15:14:57Z

@vishwakftw Hi, can you tell me if there's a resolution to this thread?
Or what the best way to perform batch SVD on small tensors (smaller than 32*32) is?

vishwakftw · 2019-05-14T15:20:00Z

Hi @SaiK95. Sorry I haven’t been able to get to this in the past few months due to college. I’ll get to it in the summer, within a month.

Currently, the best and only possible way to do it would be to run a loop.

SaiK95 · 2019-05-14T15:23:48Z

@vishwakftw No problem, thanks for the quick reply!

vishwakftw · 2019-07-29T09:53:45Z

Closing this since the feature is available on master. The current implementation uses sequential MAGMA calls in a for-loop.

soumith added the todo Not as important as medium or high priority tasks, but we will work on these. label Nov 19, 2018

vishwakftw self-assigned this Mar 4, 2019

pietern added feature A request for a proper, new feature. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 14, 2019

vishwakftw closed this as completed Jul 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batched SVD using cuSolver #14175

Batched SVD using cuSolver #14175

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Batched SVD using cuSolver #14175

Batched SVD using cuSolver #14175

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!