Feature: Batch matmul #88

dsyme · 2020-02-26T15:11:14Z

Adjust MatMul to support batching and broadcasting, in both Tensor and the reference implementation

Builds on #85

gbaydin

Transpose is missing tests in TestTensor.fs

I would like to implement the derivative tests to convince myself of the correctness. Let's add a todo comment in the TestDerivative.fs file to remember the missing test for the time being.

Also it would be great to have some Python reference code.

src/DiffSharp.Core/Tensor.fs

… into feature/batch-matmul

dsyme · 2020-04-22T19:14:57Z

@gbaydin Can we get this in do you think?

dsyme · 2020-04-29T13:56:18Z

I've merged this with dev and it should now be ready (once tests and coverage pass)

codecov · 2020-04-29T13:58:15Z

Codecov Report

Merging #88 into dev will increase coverage by 5.36%.
The diff coverage is 76.47%.

@@            Coverage Diff             @@
##              dev      #88      +/-   ##
==========================================
+ Coverage   67.11%   72.47%   +5.36%     
==========================================
  Files          18       18              
  Lines        5269     5714     +445     
  Branches     1296     1325      +29     
==========================================
+ Hits         3536     4141     +605     
+ Misses       1017      837     -180     
- Partials      716      736      +20

Impacted Files	Coverage Δ
src/DiffSharp.Core/Extensions.fs	`58.16% <0.00%> (+3.32%)`	⬆️
src/DiffSharp.Core/RawTensor.fs	`86.26% <ø> (-1.14%)`	⬇️
src/DiffSharp.Core/Tensor.fs	`78.20% <ø> (+8.63%)`	⬆️
src/DiffSharp.Core/Shape.fs	`56.57% <60.00%> (+4.78%)`	⬆️
...iffSharp.Backends.Reference/Reference.RawTensor.fs	`71.14% <100.00%> (+1.07%)`	⬆️
src/DiffSharp.Backends.Torch/Torch.RawTensor.fs	`85.81% <100.00%> (-0.75%)`	⬇️
src/DiffSharp.Core/DiffSharp.Numerical.fs	`77.63% <0.00%> (-3.95%)`	⬇️
src/DiffSharp.Core/Data.fs	`74.67% <0.00%> (+0.69%)`	⬆️
... and 13 more

…ture/batch-matmul

dsyme · 2020-05-04T16:08:10Z

@gbaydin I've merged this with dev, it should now be ready

gbaydin · 2020-05-05T23:50:04Z

I think the behavior of the transpose operation in this branch is not consistent with PyTorch.

PyTorch transpose behavior is as follows:

a = torch.randn([])
b = torch.randn([10])
c = torch.randn([10,20])
d = torch.randn([10,20,30])
e = torch.randn([10,20,30,40])
at = torch.t(a) # Gives shape []
bt = torch.t(b) # Gives shape [10]
ct = torch.t(c) # Gives shape [20, 10]
dt = torch.t(d) # Fails because d.dim > 2
et = torch.t(e) # Fails because d.dim > 2

DiffSharp behavior in this branch is as follows:

let a = dsharp.randn([])
let b = dsharp.randn([10])
let c = dsharp.randn([10;20])
let d = dsharp.randn([10;20;30])
let e = dsharp.randn([10;20;30;40])
let at = a.transpose() // Fails because a.dim < 2
let bt = b.transpose() // Fails because b.dim < 2
let ct = c.transpose() // Gives shape [20; 10]
let dt = d.transpose() // Gives shape [10; 30; 20]
let et = e.transpose() // Gives shape [10; 20; 40; 30]

I believe there is no need for a "batch-transpose" operation in general and batch transposition can be achieved when we have the general transpose operation torch.transpose(input,dim0,dim1) which can freely swap any two dimensions in all shapes of tensors. https://pytorch.org/docs/stable/torch.html#torch.transpose

I think this batch-transpose behavior was needed mainly for the reverse mode of batch matrix multiplication. We can either:

keep this and rename it to something like batchTranspose (and make it internal?) or
implement the general transpose(dim0, dim1) (which is currently missing and is a needed operation anyway) and use it for this purpose.

I would go with the second option because it improves the api. In both cases we can have the regular transpose (without dim0, dim1) behave the same way with PyTorch.

dsyme · 2020-05-06T10:32:25Z

Yup agreed. There's also torch.permute which is even more general right?

https://stackoverflow.com/questions/57512113/how-to-take-a-transpose-for-each-matrix-in-a-batch-in-pytorch

dsyme · 2020-05-06T10:33:16Z

keep this and rename it to something like batchTranspose (and make it internal?) or

This might be the simplest option for now, to get the batch matmul implementation and tests consolidated? Then deal with the second issue?

gbaydin · 2020-05-06T11:18:24Z

keep this and rename it to something like batchTranspose (and make it internal?) or

This might be the simplest option for now, to get the batch matmul implementation and tests consolidated? Then deal with the second issue?

Ok agreed. So then let's go with renaming the current transpose to batchTranspose and reintroducing a separate transpose that is identical to PyTorch's t. We can deal with the general transpose(dim0, dim1) separately.

…iffSharp into feature/batch-matmul

gbaydin · 2020-05-06T16:25:52Z

@gbaydin 8000 Can we relax the code coverage settings? It's really tricky to get them going up all the time " 63.49% of diff hit (target 63.67%)"

These are the default settings and I agree they are too strict. I looked into relaxing them before but it was surprisingly difficult to find the information. Let me look a bit better and fix it. :)

dsyme · 2020-05-06T17:01:34Z

These are the default settings and I agree they are too strict. I looked into relaxing them before but it was surprisingly difficult to find the information. Let me look a bit better and fix it. :)

If you like give me admin rights? I can poke around

gbaydin · 2020-05-06T18:20:48Z

These are the default settings and I agree they are too strict. I looked into relaxing them before but it was surprisingly difficult to find the information. Let me look a bit better and fix it. :)

If you like give me admin rights? I can poke around

I think you already have it. Codecov uses GitHub permissions https://codecov.io/gh/DiffSharp/DiffSharp/ Please let me know if it doesn't work.

gbaydin · 2020-05-06T18:32:47Z

I think we need to set the project/patch targets to custom values using https://github.com/DiffSharp/DiffSharp/blob/dev/codecov.yml. See here: https://docs.codecov.io/docs/commit-status

Edit: I did make some changes in the threshold values. It should now allow coverage to fall by 10% in a PR before failing. I don't know if it will work as intended. I guess we will see.

gbaydin · 2020-05-06T22:52:23Z

Inspecting this further, I have the following concerns about the PR:

The behavior of DiffSharp matmul is not the same with torch.matmul which has special cases for handling 1d arguments (e.g., doing matrix-vector products for 2d-1d combinations, dot product for the 1d-1d combination), doing a normal matmul (2d-2d), and doing a batched matmul (bmm, see below) with broadcasting according to the dimensionality of the arguments.
In torch there is a torch.bmm that does batch-matmul only for 3d tensors. But DiffSharp matmul is not the same with this operation either because it works with >=2d tensors and supports broadcasting.

I think if we can get matmul to behave identical to torch.matmul, it would be great. Introducing a bmm that works with 3d-3d only is a minor thing and should be straightforward.

I had other comments about the change of MatMulT2T2 to MatMulTT in the RawTensor API. But I think this is actually a good simplification and we should keep it. An alternative I was thinking about was to have MatMulT2T2 and MatMulT3T3 that map to gemm and gemm_batch type of operations in CUDA, MKL, etc. (https://devblogs.nvidia.com/cublas-strided-batched-matrix-multiply/). But I can see why MatMulTT is done in this way in this PR to work with the broadcasting expansions. In the backends, we can do the best calls to gemm, gemm_batch, etc. from within MatMulTT.

Another thing that we need to think about is whether to introduce MatMulT2T1 or MatMulT1T1 (or DotTT) type of RawTensor operations for matrix-vector and dot products. These would correspond to things like sgemv, sdot in the low level. But probably handling these through the same MatMulTT method and calling the necessary low-level method from there should be fine. In Tensor level these correspond to torch.mv and torch.dot.

dsyme · 2020-05-07T09:33:36Z

Inspecting this further, I have the following concerns about the PR:

Cool, makes sense

gbaydin · 2020-05-13T00:27:20Z

Some relevant discussions here: pytorch/pytorch#18027 and here: tensorflow/tensorflow#5523

dsyme · 2020-09-09T12:09:44Z

@gbaydin

I think if we can get matmul to behave identical to torch.matmul, it would be great. Introducing a bmm that works with 3d-3d only is a minor thing and should be straightforward.

I have made and tested this adjustment based on the Pytorch docs for matmul and this should now be ready

Another thing that we need to think about is whether to introduce MatMulT2T1 or MatMulT1T1 (or DotTT) ...

I haven't done these parts.

gbaydin · 2020-09-09T12:11:13Z F438

Thank you, I will have a look and merge soon.

dsyme · 2020-09-29T15:00:58Z

@gbaydin ping :)

gbaydin · 2020-10-26T01:14:25Z

Finally merged this (with a lot of delay!)

dsyme · 2020-11-03T12:20:06Z

Something went wrong with the overall diff for this in Tensor.fs, which is showing 6000+ line changes (nearly all whitespace)

This merge resulted in a complete rewrite of the file, I'm not sure why: 3c94672

I'll try to undo and force push a simpler diff

Redo #88 batch matmul

batch matmul

fdbe45a

dsyme changed the base branch from dev to feature/expand February 26, 2020 15:11

dsyme mentioned this pull request Feb 26, 2020

Roadmap to DiffSharp 1.0 #69

Closed

34 tasks

dsyme changed the title ~~Batch matmul~~ Feature: Batch matmul Feb 27, 2020

gbaydin reviewed Feb 28, 2020

View reviewed changes

src/DiffSharp.Core/Tensor.fs Outdated Show resolved Hide resolved

dsyme added 6 commits February 28, 2020 15:21

Merge branch 'feature/expand' of https://github.com/DiffSharp/DiffSharp…

6cf7ed0

… into feature/batch-matmul

fix transpose case and add explicit Expand testing

171d662

organise array helpers

5c24caf

integrate dev

f30e516

integrate dev

76378d4

integrate dev renaming

cd29bc6

dsyme mentioned this pull request Mar 2, 2020

Move conv rev diff ops out of main loop #90

Merged

Merge branch 'feature/expand' of https://github.com/DiffSharp/DiffSharp…

be905aa

… into feature/batch-matmul

dsyme changed the base branch from feature/expand to dev April 29, 2020 13:36

merge dev and resolve conflicts

23b3c7d

dsyme added 2 commits May 4, 2020 17:07

merge dev

6a959ce

Merge branch 'dev' of https://github.com/DiffSharp/DiffSharp into fea…

4e86210

…ture/batch-matmul

dsyme and others added 3 commits May 4, 2020 17:21

merge dev

ab0fddb

improve coverage

098b813

merge dev

a956604

dsyme added 2 commits May 6, 2020 14:47

improve coverage

b0d0ca9

Merge branch 'feature/batch-matmul' of https://github.com/DiffSharp/D…

77107a6

…iffSharp into feature/batch-matmul

make batchTranspose internal

4c0c005

minor renames

6abcdfd

dsyme closed this May 8, 2020

dsyme reopened this May 8, 2020

dsyme added 4 commits May 21, 2020 13:46

integrate dev

828dd07

integrate dev

dc046b4

test matmul

8d3dd42

fix tests

70d3534

dsyme and others added 3 commits September 9, 2020 13:31

fix large dimension matmul

c9d01db

integrate docs

429b47f

integrate docs

3060ce5

Don Syme added 2 commits October 6, 2020 18:42

merge dev

3ed86d4

merge dev

b93b754

dsyme mentioned this pull request Oct 20, 2020

Question: Tensor.matmul #228

Closed

Merge branch 'dev' into feature/batch-matmul

3c94672

gbaydin merged commit 083ac37 into dev Oct 26, 2020

dsyme mentioned this pull request Nov 3, 2020

Redo #88 batch matmul #230

Merged

dsyme added a commit that referenced this pull request Nov 3, 2020

Merge pull request #230 from dsyme/mm2

e3139bc

Redo #88 batch matmul

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Batch matmul #88

Feature: Batch matmul #88

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Feature: Batch matmul #88

Feature: Batch matmul #88

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!