Optimization of `a @ a.H` style operations using BLAS herk routines #6951

jakirkham · 2016-01-06T17:20:07Z

Related: #6794
Related: #6932

In addition to optimizing a @ a.T using syrk, there should be a way to optimize a @ a.H using hyrk. This is a bit tricky as a.H is not a view. So, it cannot be done the same way that was done for a @ a.T ( #6932 ). There are number of proposals for handling this optimization in some fashion, but one should be picked.

Create a "transposed-complex" type. ( np.dot(a, a.T) should be detected and dispatched to the BLAS syrk routine #6794 (comment) )
Add a function for all a @ a.T and a @ a.H that takes a and an argument for whether the conj needs to be taken. ( np.dot(a, a.T) should be detected and dispatched to the BLAS syrk routine #6794 (comment) )
Add a keyword argument to dot that determines whether to do a conjugated dot product or not. ( np.dot(a, a.T) should be detected and dispatched to the BLAS syrk routine #6794 (comment) )
Change np.dot to allow it to take a single argument to allow for the same behavior presented in 2.

Raising to the attention of the previously interested parties: @njsmith @charris @argriffing

The text was updated successfully, but these errors were encountered:

charris · 2016-01-06T17:38:28Z

Adding an argument to dot would not help @, which is where we are trying to go. OTOH, it does provide a function, if somewhat indirectly...

jakirkham · 2016-01-06T17:47:47Z

Definitely agree with that statement. That being said, it is the easiest to implement out of the three. Though 2 doesn't seem that bad really and is a nice balance of the objectives.

mhvk · 2016-01-06T17:59:10Z

I quite like the "transposed-complex" type (conjugated64 and conjugated128?), as it would make conj a very cheap operation. But it does seem it would imply quite a lot of work, between all the ufunc loops, FFTs, etc. There is also substantial risk of confusion...

seberg · 2016-01-06T18:44:20Z

Would it be all so bad to have an np.linalg.blah function for this kind of operation?

EDIT: Admittingly, linalg mostly only works for numerical types, it would be good to support it also for object type (since we actually use this kind of function ourself in correlate, etc. I think.

njsmith · 2016-01-07T00:07:58Z

@mhvk: You wouldn't need to rewrite all the ufunc loops, just the casting functions (so log(conjugated_array) would automatically cast to regular complex and then perform the operation in that space). You're write that this would probably shake out all kinds of corner case bugs though...

@seberg: the python object protocol has no "conjugate" operation so handling object types is kinda doomed regardless :-/

I guess my vote is kinda leaning towards adding a conjugate flag to dot... it's true that it doesn't help with @, but the only thing that could help with @ is a conjugated-type idea, and that's a much bigger investment than just adding a kwarg. And it's okay if not everything uses @ -- @ is important because it makes 90% of cases more readable, but if you have special needs like optimizing a A @ A.T.conj() operation then having to write dot(A, A.T, conjugate="right") is not so bad. And the advantage compared to a crossprod-type function is that (1) it also works for general conjugation (A @ B.conj(), not just A @ A.T.conj()), and (2) it avoids cluttering the namespace with more special-purpose dot-like functions...

mhvk · 2016-01-07T01:29:14Z

Would it be an idea to adapt np.matmul instead of np.dot? It is newer and still claims to be experimental, so there is a bit more freedom to change things. Also, the examples above are specifically for matrix multiplication (dot immediately implies inner vector product for me, so if I were new to numpy, I wouldn't look there if I were trying to find how to do A @ A.T).

That said, I cannot really think of keyword argument names that would lead to code that would be more obvious than just using specific functions. In that sense, @seberg's alternative of adding such functions in np.linalg seems sensible, as it avoids further crowding the np namespace. And of course one can just refer to these in the matmul documentation.

p.s. Now if only vdot did not implicitly flatten arrays... It already does the conj, albeit of the first argument... http://docs.scipy.org/doc/numpy/reference/generated/numpy.vdot.html

charris · 2016-01-07T05:35:55Z

@mhvk vdot is lost in the woods without a vector. Its usefulness is limited.

seberg · 2016-01-07T09:10:26Z

@njsmith yes, but I still dislike that our conjugate just ignores object arrays (and if someone puts complex into object arrays, thus some things will just be wrong). The numbers protocol does suggest to implement the .conjugate method. I am not really suggesting to attempt to use it, since I am not sure it makes much sense. In the end, the thing I could imagine the most comes back to new python type specific dtypes which make testing for such a method maybe more reasonable. Anyway, different issue :( and quite far fetched.

The reason I suggested that was mostly because I thought this function might only take a single argument anyway. But I guess even A @ B.conjugate could be added to dot and might be pretty nice. If that is nice, there is no point in adding a new optimization function when it already is optimized (even if dot(A, A.T, conjugate="right") may not be quite obvious when A.H is "obvious").

ewmoore · 2016-01-07T13:23:58Z

np.conjugate and ndarray.conjugate both handle object arrays.

In [6]: a = np.array([1+1j, 2+2j, 3+3j], object)

In [7]: a
Out[7]: array([(1+1j), (2+2j), (3+3j)], dtype=object)

In [8]: np.conjugate(a)
Out[8]: array([(1-1j), (2-2j), (3-3j)], dtype=object)

In [9]: a.conjugate()
Out[9]: array([(1-1j), (2-2j), (3-3j)], dtype=object)

This was (in part) #4887. Note that this will error if the object array contains something that doesn't have a conjugate method, but it does work for all of the python builtin types.

seberg · 2016-01-07T18:40:27Z

Hmmm..... My guess is, that was not yet in a release? I hope it will work out ;). But it breaks any code that uses conjugate with very weird numbers or non-numbers.

jakirkham · 2016-01-07T18:55:24Z

@seberg, are you referring to @ewmoore's contribution? It was introduced in 1.9.0 and has been present since.

jakirkham · 2016-01-07T19:10:21Z

So, how about an even simpler option? Allow numpy.dot to take a single array argument.

When this happens, interpret numpy.dot(A) as A @ A.H. The other ordering ( A.H @ A ) can be gotten by doing numpy.dot(A.H). If the matrices are not complex, A.H is just A.T. We could add a keyword conjugate=True. If this is set to False, it will reproduce the cases A @ A.T and A.T @ A regardless of array type.

Here is why I think this is a good idea:

Meets case 2.
Meets case 3.
Keeps the syntax simple.
Doesn't add a new dot type function.
Allows this optimization to be close to existing optimizations and easily leveraged.
Avoids far reaching changes to existing code.

ewmoore · 2016-01-07T19:10:22Z

If we address a @ a.H it would be a terrible shame if a.H @ b was still impossible without making a copy of a. I'm +1 for a keyword to dot. I would absolutely make use of this were it ava 8000 ilable.

ewmoore · 2016-01-07T19:11:10Z

@jakirkham seems awfully magical though.

jakirkham · 2016-01-07T19:12:02Z

@ewmoore, why do you say that? It is just dispatching herk or syrk.

ewmoore · 2016-01-07T19:15:04Z

I don't mean the implementation, but the python interface. It isn't anywhere close to obvious what np.dot(a) is suppose to do. np.dot(a, a.T) is very clear. np.dot(a, a.T, conjugate='right') is pretty clear.

jakirkham · 2016-01-07T19:18:12Z

I guess I am not thrilled with the keyword argument because we then have to ensure we handle cases like np.dot(a, b, conjugate="right") where a and b are two totally different arrays. This makes the code in cblas_matrixproduct a bit more complex (more possible bugs) handling cases that we don't really benefit from.

Alternatively, we actually add a new function that does handle this case.

mhvk · 2016-01-07T19:22:51Z

@jakirkham - I was thinking about the same solution yesterday, but didn't pursue it because readability really counts. The only way out I could think of is having some string placeholder like self.T.conj for the second object (which would make a bit more sense if one considers the method rather than the function, i.e., a.dot(b='self.T'), but is still very ugly).

That's why above I ended up concluding that a special function in linalg might still be the best solution, so that at least we offer the benefit of the speed to those who need it.

p.s. Then again, there has been some talk about allowing some combinations or chaining of ufuncs (I think abs and max was one) -- but that is for a longer-term future.

mhvk · 2016-01-07T19:26:05Z

p.s. The fact that np.dot can be overridden by __numpy_ufunc__ makes me extra wary about allowing one to omit its second argument, replacing it with something non-array-like, or even introducing additional keywords in its signature. It should really continue to look and behave as much as possible like a normal ufunc, otherwise things are just going to break in unexpected places.

jakirkham · 2016-01-07T19:44:51Z

Maybe this is just me, but when I see a signature like this np.dot(A) and knowing what np.dot does normally, I think it must be some form of matrix multiplication of A with itself. The only way for this to work on any arbitrary matrix is one of these two A @ A.T, A @ A.H or the reversed versions. However, as I have not done np.dot(A.T) or np.dot(A.H), I would assume the reversed ones are out. If I know the matrix is real, then the two options are the same. If it is complex, I would expect A @ A.H as I so rarely see this A @ A.T with complex matrices. I added the conjugate argument to make this explicit and allowing the option to be changed if someone really wanted A @ A.T, but I would be just as happy dropping it.

As for a method version, I was thinking A.dot() with the same behavior. It could have a conjugate=True default keyword argument for the same reasons, but again I am not attached to this.

However, this is just how I think. If people feel this is still unnatural, then we need something else.

If we are really concerned that adding or changing arguments to np.dot, then I would definitely lean towards a new function name being added, but with the same behavior.

Though adding a new conjugated complex type makes A @ A.H work, I am worried about all the other fallout from such a change. This includes potential confusion and misuse of the type.

njsmith · 2016-01-07T21:03:23Z

@jakirkham: providing support for A @ B.H does require extra work, but that's because it provides extra functionality :-). gemm has options to implicitly conjugate the left and/or right argument that aren't currently available from numpy, so we'd genuinely be exposing new speedup opportunities. I guess there is some question of whether these speedups are actually useful. I'm guessing that they are from the fact that the BLAS designers felt it worth including, and from @ewmoore requesting the functionality above.

jakirkham · 2016-01-07T21:34:56Z

Let me see if I understand you correctly @njsmith, you're saying we feed the conjugate flag into gemm. You are right the transpose allows conjugate transpose to specified, as well. Yeah, I forgot about this. I take my previous comment back.

Thinking on this, I propose the following adjustment to your idea. Add keyword arguments to dot for the first and second argument that default to None (or something like this), and allow them to take "T" or "H". These can then all be directly fed to the BLAS in various forms. Even "T" might see a mild speedup over what we have already done as it won't even need to construct the transpose object. This is still a very rough idea though and may need a bit more kicking around before it takes a reasonable shape.

ewmoore · 2016-01-07T21:57:38Z

I'm a little torn now, truthfully. Given @mhvk reminder about __numpy_ufunc__. It be nice to not make than any more complicated than it need be already.

jakirkham · 2016-01-16T18:36:47Z

I think we should pick this up again at some point, but it sounds like there is a 1.11.0 release in the process of being made. Let's pick this up again after that.

jakirkham · 2016-10-12T20:04:57Z

Any more thoughts on this?

rileyjmurray · 2021-12-02T16:36:35Z

Joining the thread. I found this GitHub issue by looking into the behavior of vdot. I hoped it would allow me to perform operations like a.H @ b without the extra memory allocation a.conjugate(). Alas, I see that vdot vectorizes its input, and that this has been known for a while.

I don't really have opinions on how to enable this kind of functionality. Anything that bypasses calling scipy.linalg.blas functions would be appreciated for my purposes.

mattip mentioned this issue Jul 5, 2018

WIP: ENH: Matmul as ufunc #11133

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Optimization of `a @ a.H` style operations using BLAS herk routines #6951

Optimization of `a @ a.H` style operations using BLAS herk routines #6951

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Optimization of a @ a.H style operations using BLAS herk routines #6951

Optimization of a @ a.H style operations using BLAS herk routines #6951

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Optimization of `a @ a.H` style operations using BLAS herk routines #6951

Optimization of `a @ a.H` style operations using BLAS herk routines #6951