-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BENCH, DOC: Benchmark matmul and update documentation #7034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…run in the benchmarking suite.
@njsmith, I remember we were talking awhile back about the code path that |
Oh, cool, then I misremembered. Sorry about that.
|
Previously, ``gemm`` BLAS operations were used for all matrix products. Now, 8000 | ||
if the matrix product is between a matrix and its transpose, it will use | ||
``syrk`` BLAS operations for a performance boost. | ||
``syrk`` BLAS operations for a performance boost. This optimization has been | ||
extended to ``numpy.dot``, ``numpy.inner``, and ``numpy.matmul``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe list @
here explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Added.
Otherwise lgtm and should get into 1.11 since it's mostly a release note update. |
…n has been extended to several NumPy operations.
e927bce
to
1504975
Compare
No worries. It's always nice to get something for free. :) |
Lgtm |
Though it does make me wonder how many other functions are optimized that we weren't previously aware of. |
So, Travis checks out. Does AppVeyor even run the benchmarks? |
No ;) AppVeyor seems to have a Saturday hangover. |
AppVeyor is running 30 min/build and 12 builds behind, so about 6 hours to catch up. it's turning into a real bottleneck. |
@charris I think it's OK to merge things unless they have hairy Windows-specific stuff in them. if something fails it'll still show up in one of the later builds (which is a huge improvement over what we had a few weeks ago). If it stays like this we can solve it with a little bit of money. |
The queue here is indeed pretty daunting to look at! https://ci.appveyor.com/project/charris/numpy/history But yeah, waiting a little longer seems reasonable too |
BENCH, DOC: Benchmark matmul and update documentation
Well, that's 30 minutes less to wait. Thanks @jakirkham . |
Sounds good to me. :) Thanks everyone. If it turns out AppVeyor fails (not sure what I would have done to cause it though), please just ping me. |
I think @matthew-brett mentioned that we might be able to just ask AppVeyor for more job bandwidth and get it for free. ( #7020 (comment) ) |
@jakirkham you can just run |
Yes, I am sure I can. I was just hoping they could be published on the benchmarking website. If it's too much to ask, then it need not be done. Sorry, I should have been more explicit. |
ah ok, just checking that you were aware that you could run them. |
Well, actually, I just tried and I get this traceback.
|
Turns out there was no |
The benchmarking website is automated, and I'm unfortunately not volunteering to run anything manually. You can edit asv.conf.json to change virtualenv to conda. Or, check out the commits via git and use |
Ok, that's fine. Thanks for letting me know. I ran into more issues getting the comparison to work, but I will raise the issue on asv instead of here. |
See related issue here ( airspeed-velocity/asv#367 ). |
So, now I can get it to run. :) However, it says no benchmarks selected. Here is what I see.
Here is the diff applied to
|
Related: #6932
Based on the code path followed by
numpy.matmul
(35790f
@numpy/core/src/multiarray/multiarraymodule.c#L2408 ), it appears this should also benefit from thesyrk
optimization. Added some benchmarks to demonstrate that optimization. Also, updated the release documentation to note where thesyrk
optimization is used. Finally, resorted some benchmarks to match the order they appear at runtime.