BENCH, DOC: Benchmark matmul and update documentation #7034

jakirkham · 2016-01-16T19:00:39Z

Related: #6932

Based on the code path followed by numpy.matmul ( 35790f@numpy/core/src/multiarray/multiarraymodule.c#L2408 ), it appears this should also benefit from the syrk optimization. Added some benchmarks to demonstrate that optimization. Also, updated the release documentation to note where the syrk optimization is used. Finally, resorted some benchmarks to match the order they appear at runtime.

…run in the benchmarking suite.

jakirkham · 2016-01-16T19:16:21Z

@njsmith, I remember we were talking awhile back about the code path that matmul follows. So, I figured I should take a look at the source code. It appears to be calling cblas_matrixproduct, which we optimized for using syrk already. ( #6932 ) Since we already get that speedup, I figured we should better document it and maybe add some benchmarks.

njsmith · 2016-01-16T19:18:06Z

Oh, cool, then I misremembered. Sorry about that.
On Jan 16, 2016 11:16, "jakirkham" notifications@github.com wrote:

@njsmith https://github.com/njsmith, I remember we were talking awhile
back about the code path that matmul follows. So, I figured I should take
a look at the source code. It appears to be calling cblas_matrixproduct,
which we optimized for using syrk already. ( #6932
#6932 ) Since we already get that
speedup, I figured we should better document it and maybe add some
benchmarks.

—
Reply to this email directly or view it on GitHub
#7034 (comment).

njsmith · 2016-01-16T19:19:05Z

doc/release/1.11.0-notes.rst

 Previously, ``gemm`` BLAS operations were used for all matrix products. Now,
 if the matrix product is between a matrix and its transpose, it will use
-``syrk`` BLAS operations for a performance boost.
+``syrk`` BLAS operations for a performance boost. This optimization has been
+extended to ``numpy.dot``, ``numpy.inner``, and ``numpy.matmul``.


Maybe list @ here explicitly?

Good call. Added.

njsmith · 2016-01-16T19:19:42Z

Otherwise lgtm and should get into 1.11 since it's mostly a release note update.

…n has been extended to several NumPy operations.

jakirkham · 2016-01-16T19:22:01Z

No worries. It's always nice to get something for free. :)

njsmith · 2016-01-16T19:24:48Z

Lgtm

jakirkham · 2016-01-16T19:26:48Z

Though it does make me wonder how many other functions are optimized that we weren't previously aware of.

jakirkham · 2016-01-16T20:24:24Z

So, Travis checks out. Does AppVeyor even run the benchmarks?

charris · 2016-01-16T21:04:55Z

No ;) AppVeyor seems to have a Saturday hangover.

charris · 2016-01-16T21:11:14Z

AppVeyor is running 30 min/build and 12 builds behind, so about 6 hours to catch up. it's turning into a real bottleneck.

rgommers · 2016-01-16T21:14:51Z

@charris I think it's OK to merge things unless they have hairy Windows-specific stuff in them. if something fails it'll still show up in one of the later builds (which is a huge improvement over what we had a few weeks ago).

If it stays like this we can solve it with a little bit of money.

njsmith · 2016-01-16T21:21:25Z

The queue here is indeed pretty daunting to look at! https://ci.appveyor.com/project/charris/numpy/history

But yeah, waiting a little longer seems reasonable too

BENCH, DOC: Benchmark matmul and update documentation

charris · 2016-01-16T21:31:05Z

Well, that's 30 minutes less to wait. Thanks @jakirkham .

jakirkham · 2016-01-16T21:57:14Z

Sounds good to me. :) Thanks everyone. If it turns out AppVeyor fails (not sure what I would have done to cause it though), please just ping me.

jakirkham · 2016-01-16T22:13:48Z

I think @matthew-brett mentioned that we might be able to just ask AppVeyor for more job bandwidth and get it for free. ( #7020 (comment) )

jakirkham · 2016-01-19T15:50:14Z

@pv, would it be possible, if it's not too much trouble, to run these benchmarks through airspeed velocity comparing ( a7377d8 ) to ( 25c8d1c )? There should be a roughly 2x speedup between the two. Both commits precede this one by ~10 days.

rgommers · 2016-01-19T18:11:48Z

@jakirkham you can just run python runtests.py --bench-compare a7377d8 25c8d1c and get the results right? Or is that not working?

jakirkham · 2016-01-19T18:25:55Z

Yes, I am sure I can. I was just hoping they could be published on the benchmarking website. If it's too much to ask, then it need not be done. Sorry, I should have been more explicit.

rgommers · 2016-01-19T19:29:47Z

ah ok, just checking that you were aware that you could run them.

jakirkham · 2016-01-19T19:35:33Z

Well, actually, I just tried and I get this traceback.

Traceback (most recent call last):
  File "runtests.py", line 462, in <module>
  File "runtests.py", line 240, in main
  File "/opt/conda/lib/python2.7/os.py", line 346, in execvp
    _execvpe(file, args)
  File "/opt/conda/lib/python2.7/os.py", line 382, in _execvpe
    func(fullname, *argrest)
OSError: [Errno 2] No such file or directory

jakirkham · 2016-01-19T19:43:51Z

Turns out there was no asv. After installing it and virtualenv things flamed out, but I expect it is because conda and virtualenv don't play nice. Is there some way for me to override the virtualenv install?

pv · 2016-01-19T20:19:58Z

The benchmarking website is automated, and I'm unfortunately not volunteering to run anything manually.

You can edit asv.conf.json to change virtualenv to conda. Or, check out the commits via git and use --bench instead of --bench-compare and compare the timings manually. Benchmarking the currently checked out code can be done without needing for asv to manage build/installation.

jakirkham · 2016-01-19T21:21:03Z

Ok, that's fine. Thanks for letting me know.

I ran into more issues getting the comparison to work, but I will raise the issue on asv instead of here.

jakirkham · 2016-01-19T21:26:53Z

See related issue here ( airspeed-velocity/asv#367 ).

jakirkham · 2016-01-19T21:38:12Z

So, now I can get it to run. :) However, it says no benchmarks selected. Here is what I see.

$ python runtests.py --bench-compare a7377d8 25c8d1c 
********************************************************************************
WARNING: you have uncommitted changes --- these will NOT be benchmarked!
********************************************************************************
· Fetching recent changes
· Fetching recent changes
· Creating environments
· Discovering benchmarks
·· Uninstalling from py2.7-six
·· Installing into py2.7-six.
· No benchmarks selected

Here is the diff applied to asv.conf.json.

diff --git a/benchmarks/asv.conf.json b/benchmarks/asv.conf.json
index d837b0d..d8dc4c3 100644
--- a/benchmarks/asv.conf.json
+++ b/benchmarks/asv.conf.json
@@ -28,7 +28,7 @@
     // If missing or the empty string, the tool will be automatically
     // determined by looking for tools on the PATH environment
     // variable.
-    "environment_type": "virtualenv",
+    "environment_type": "conda",

     // the base URL to show a commit for the project.
     "show_commit_url": "https://github.com/numpy/numpy/commit/",

jakirkham added 2 commits January 16, 2016 13:50

BENCH: Reorganize existing benchmarks by the order they show up when …

4c50407

…run in the benchmarking suite.

BENCH: Add some benchmarks for matmul.

e5b108c

charris added 01 - Enhancement component: numpy._core component: benchmarks labels Jan 16, 2016

njsmith reviewed Jan 16, 2016
View reviewed changes

DOC: Update the release notes to state that the A.T @ A optimizatio…

1504975

…n has been extended to several NumPy operations.

jakirkham force-pushed the bench_matmul branch from e927bce to 1504975 Compare January 16, 2016 19:21

charris added a commit that referenced this pull request Jan 16, 2016

Merge pull request #7034 from jakirkham/bench_matmul

d65d871

BENCH, DOC: Benchmark matmul and update documentation

charris merged commit d65d871 into numpy:master Jan 16, 2016

jakirkham deleted the bench_matmul branch January 16, 2016 21:55

jakirkham mentioned this pull request Jan 17, 2016

ENH: Use syrk to compute certain dot products more quickly and accurately #6932

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BENCH, DOC: Benchmark matmul and update documentation #7034

BENCH, DOC: Benchmark matmul and update documentation #7034

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BENCH, DOC: Benchmark matmul and update documentation #7034

BENCH, DOC: Benchmark matmul and update documentation #7034

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!