You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I notice that the XLA Dot operation copies "outer-product style" broadcast semantics from numpy.dot:
Input
Output
Semantics
array [p x q x r] dot array [s x r x t]
array [p x q x s x t]
array dot product (read below)
In brief, I think this is a mistake. It would be better to follow the "matmul style" style broadcasting semantics of Python's @ operation and NumPy's matmul.
matmul's broadcasting is much more general, and in my opinion, also easier to understand. For example, it can do batch matrix-multiplication, but also can still do outer product style broadcasting if you insert dummy dimensions of length 1 (the axes do end up in a different order), e.g.,
batch matmul: [p x q x r] matmul [p x r x t] -> [p x q x t]
outer product matmul: [p x 1 x q x r] matmul [1 x s x r x t] -> [p x s x q x t]
If we could go back in time as NumPy developers, we assuredly would change dot to work this way (now we cannot, because of backwards compatibility concerns). So it would be nice to change this for XLA before we lock in this behavior.
The text was updated successfully, but these errors were encountered:
Based on this input and other considerations, we've decided to restrict the semantics of XLA's Dot operation to 1D and 2D arrays in the initial release. We may consider expanding it to higher dimensions in the future, and at that point we'll be considering different possible semantics. For the time being, however, this issue can be closed.
Uh oh!
There was an error while loading. Please reload this page.
I notice that the XLA Dot operation copies "outer-product style" broadcast semantics from
numpy.dot
:dot
array [s x r x t]In brief, I think this is a mistake. It would be better to follow the "matmul style" style broadcasting semantics of Python's
@
operation and NumPy'smatmul
.matmul's broadcasting is much more general, and in my opinion, also easier to understand. For example, it can do batch matrix-multiplication, but also can still do outer product style broadcasting if you insert dummy dimensions of length 1 (the axes do end up in a different order), e.g.,
batch matmul: [p x q x r]
matmul
[p x r x t] -> [p x q x t]outer product matmul: [p x 1 x q x r]
matmul
[1 x s x r x t] -> [p x s x q x t]If we could go back in time as NumPy developers, we assuredly would change
dot
to work this way (now we cannot, because of backwards compatibility concerns). So it would be nice to change this for XLA before we lock in this behavior.The text was updated successfully, but these errors were encountered: