Add benchmarks #16723

rth · 2020-03-19T13:45:53Z

We should add a folder with airspeed velocity (ASV) benchmarks inside scikit-learn.

ASV benchmarks done by @jeremiedbb exist in https://github.com/jeremiedbb/scikit-learn_benchmarks but they currently don't get enough exposure (i.e. contributors are not aware about them). I think we should include them e.g. under asv_bench/ folder and remove redundant scripts from benchmarks/ folder.

I just made a PR to pandas, and it's nice when reviewers can ask to add an ASV benchmark for a PR claiming to improve performance. For this to work they need to be inside the scikit-learn repo. Whether we run them somewhere and where is a secondary concern IMO.

For the docs we can "steal" part of the section from the pandas dev guide https://pandas.pydata.org/docs/development/contributing.html#running-the-performance-test-suite

The text was updated successfully, but these errors were encountered:

smarie · 2020-04-07T12:04:05Z

(Copied from #16864 as requested)

Following PR #13511 it appears that there is not reference benchmark for SVMs in scikit-learn or in any side-project (sklearn-contrib).

This seems quite risky on the long run, maybe we should create one - especially to quantify the impact of changes to C code such as in PR #13511 .

I have been working quite a bit on this topic of creating reference benchmarks in the past years, leading to the creation of tools in the pytest ecosystem: pytest-cases and pytest-harvest, with a beginning of tutorial here (outdated concerning cases_fixtures usage, but still shows how to compare results of various regressors on a set of 6 datasets). I can therefore certainly try to help with a benchmark framework structure if you find such an idea interesting.

However I do not know a good set of reference datasets to start with (apart from creating challenging ones "by hand").

smarie · 2020-04-07T12:07:11Z

Note: this ticket should probably be renamed into "adding benchmarks to scikit learn", to address the whole picture.

In particular I see at least two kind of benchmarks:

applicative = performance in terms of accuracy on challenging datasets.
speed = pure speed performance, to work on code optimizations

Of course both kinds should/could be run on single models or multiple models. For example classification problems can be tackled by many models in scikit learn. They would end up in the same benchmark.

If this does not fall into this ticket please reopen #16864

NicolasHug · 2020-04-07T12:59:53Z

+1 for adding benchmarks, regressions are very easy to introduce with Cython (and even more if we ever want to use numba one day)

How does asv work in terms of workflow? Does it automatically compare a branch with master, or do PR authors need to run the benchmarks on both branches manually?

applicative = performance in terms of accuracy on challenging datasets

We typically try to have these in the tests but yes, for long ones, benchmarks may help too

jeremiedbb · 2020-04-07T13:09:14Z

We should add a folder with airspeed velocity (ASV) benchmarks inside scikit-learn.

I think it's a good idea.

I just made a PR to pandas, and it's nice when reviewers can ask to add an ASV benchmark for a PR claiming to improve performance.

ASV benchmarks done by @jeremiedbb exist in https://github.com/jeremiedbb/scikit-learn_benchmarks

I already tried that in a few PRs from people from the inria team and it was helpful. Obviously I could discuss with them irl so they could easily use the benchmark suite but if we document it properly it's not difficult to use.

I'd need to check that the benchmark suite is not outdated and I'd like to make a few modifications before we move it here if we decide to.

Whether we run them somewhere and where is a secondary concern IMO.

+1

I think we should include them e.g. under asv_bench/

what's the procedure ? Does it it requires a vote ?

rth · 2020-04-07T13:24:27Z

what's the procedure ? Does it it requires a vote ?

Thanks @jeremiedbb ! Maybe an email on the public mailing list? It's purely a developer tool that doesn't affect users or the API, so I doubt it needs a SLEP. Don't hesitate opening a PR once you made the changes you wanted.

In particular I see at least two kind of benchmarks:
applicative = performance in terms of accuracy on challenging datasets.
speed = pure speed performance, to work on code optimizations

@smarie Indeed I was considering more run time/memory benchmarks here. However if we go with ASV, it is able to also store custom metrics (e.g. model accuracy). The advantage is that it can compute it for commits in the history and identify regressions that way. By itself accuracy of a model on a problem is not meaningful, but if it changes a lot it would deserve additional investigation. Maybe we could start with run time benchmarks and add model performance benchmarks in a second step. We do try to test general (e.g. invariance) properties of estimators in common tests.

Does it automatically compare a branch with master, or do PR authors need to run the benchmarks on both branches manually?

Typically you want to run it on HEAD, then on master and show the difference. ASV allows to do that in one command. See pandas contribution guide, linked in the issue description, for more details. To give an example, the output could look something like,

$ asv continuous --bench SparseDataFrameConstructor upstream/master HEAD
[...]
[100.00%] ··· sparse.SparseDataFrameConstructor.time_from_scipy                                                                 117±1ms
       before           after         ratio
     [dbd7a5d3]       [508fda51]
     <master>         <opt-sparse-init>
-         117±1ms      26.8±0.07ms     0.23  sparse.SparseDataFrameConstructor.time_from_scipy

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

rth · 2020-04-07T13:27:01Z

Also once this is resolved, we might want to archive and m 8000 ark as no longer maintained https://github.com/scikit-learn/scikit-learn-speed and https://github.com/scikit-learn/ml-benchmarks

jeremiedbb · 2020-04-07T13:46:39Z

applicative = performance in terms of accuracy on challenging datasets.
speed = pure speed performance, to work on code optimizations

The benchmark suite allows to do accuracy checks also. For now I just check that some metric (depending on the model) and/or that the predicted values stay exactly the same between 2 commits.

jeremiedbb · 2020-04-24T14:13:13Z

I made the changes I wanted. Changes in asv broke a functionality I thought there was no work around but just found one so it's fine. I opened #17026. Still missing the documentation

rth added New Feature Build / CI Performance and removed Build / CI labels Mar 19, 2020

rth mentioned this issue Mar 19, 2020

SLEP 014 Pandas in Pandas out scikit-learn/enhancement_proposals#37

Merged

rth changed the title ~~Add ASV benchmarks~~ Adding benchmarks to scikit-learn Apr 7, 2020

rth changed the title ~~Adding benchmarks to scikit-learn~~ Adding benchmarks Apr 7, 2020

rth changed the title ~~Adding benchmarks~~ Add benchmarks Apr 7, 2020

jeremiedbb mentioned this issue Apr 24, 2020

MNT Add asv benchmark suite #17026

Merged

1 task

NicolasHug closed this as completed in #17026 Jul 29, 2020

man-shu mentioned this issue Mar 17, 2025

Set up a benchmark suite to assess performance and check for improvment or regression nilearn/nilearn#4573

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add benchmarks #16723

Add benchmarks #16723

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add benchmarks #16723

Add benchmarks #16723

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!