8000 Add benchmarks · Issue #16723 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Add benchmarks #16723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rth opened this issue Mar 19, 2020 · 8 comments · Fixed by #17026
Closed

Add benchmarks #16723

rth opened this issue Mar 19, 2020 · 8 comments · Fixed by #17026

Comments

@rth
Copy link
Member
rth commented Mar 19, 2020

We should add a folder with airspeed velocity (ASV) benchmarks inside scikit-learn.

ASV benchmarks done by @jeremiedbb exist in https://github.com/jeremiedbb/scikit-learn_benchmarks but they currently don't get enough exposure (i.e. contributors are not aware about them). I think we should include them e.g. under asv_bench/ folder and remove redundant scripts from benchmarks/ folder.

I just made a PR to pandas, and it's nice when reviewers can ask to add an ASV benchmark for a PR claiming to improve performance. For this to work they need to be inside the scikit-learn repo. Whether we run them somewhere and where is a secondary concern IMO.

For the docs we can "steal" part of the section from the pandas dev guide https://pandas.pydata.org/docs/development/contributing.html#running-the-performance-test-suite

@smarie
Copy link
Contributor
smarie commented Apr 7, 2020

(Copied from #16864 as requested)

Following PR #13511 it appears that there is not reference benchmark for SVMs in scikit-learn or in any side-project (sklearn-contrib).

This seems quite risky on the long run, maybe we should create one - especially to quantify the impact of changes to C code such as in PR #13511 .

I have been working quite a bit on this topic of creating reference benchmarks in the past years, leading to the creation of tools in the pytest ecosystem: pytest-cases and pytest-harvest, with a beginning of tutorial here (outdated concerning cases_fixtures usage, but still shows how to compare results of various regressors on a set of 6 datasets). I can therefore certainly try to help with a benchmark framework structure if you find such an idea interesting.

However I do not know a good set of reference datasets to start with (apart from creating challenging ones "by hand").

@smarie
Copy link
Contributor
smarie commented Apr 7, 2020

Note: this ticket should probably be renamed into "adding benchmarks to scikit learn", to address the whole picture.

In particular I see at least two kind of benchmarks:

  • applicative = performance in terms of accuracy on challenging datasets.
  • speed = pure speed performance, to work on code optimizations

Of course both kinds should/could be run on single models or multiple models. For example classification problems can be tackled by many models in scikit learn. They would end up in the same benchmark.

If this does not fall into this ticket please reopen #16864

@NicolasHug
Copy link
Member

+1 for adding benchmarks, regressions are very easy to introduce with Cython (and even more if we ever want to use numba one day)

How does asv work in terms of workflow? Does it automatically compare a branch with master, or do PR authors need to run the benchmarks on both branches manually?

applicative = performance in terms of accuracy on challenging datasets

We typically try to have these in the tests but yes, for long ones, benchmarks may help too

@rth rth changed the title Add ASV benchmarks Adding benchmarks to scikit-learn Apr 7, 2020
@rth rth changed the title Adding benchmarks to scikit-learn Adding benchmarks Apr 7, 2020
@rth rth changed the title Adding benchmarks Add benchmarks Apr 7, 2020
@jeremiedbb
Copy link
Member

We should add a folder with airspeed velocity (ASV) benchmarks inside scikit-learn.

I think it's a good idea.

I just made a PR to pandas, and it's nice when reviewers can ask to add an ASV benchmark for a PR claiming to improve performance.

ASV benchmarks done by @jeremiedbb exist in https://github.com/jeremiedbb/scikit-learn_benchmarks

I already tried that in a few PRs from people from the inria team and it was helpful. Obviously I could discuss with them irl so they could easily use the benchmark suite but if we document it properly it's not difficult to use.

I'd need to check that the benchmark suite is not outdated and I'd like to make a few modifications before we move it here if we decide to.

Whether we run them somewhere and where is a secondary concern IMO.

+1

I think we should include them e.g. under asv_bench/

what's the procedure ? Does it it requires a vote ?

@rth
Copy link
Member Author
rth commented Apr 7, 2020

what's the procedure ? Does it it requires a vote ?

Thanks @jeremiedbb ! Maybe an email on the public mailing list? It's purely a developer tool that doesn't affect users or the API, so I doubt it needs a SLEP. Don't hesitate opening a PR once you made the changes you wanted.

In particular I see at least two kind of benchmarks:
applicative = performance in terms of accuracy on challenging datasets.
speed = pure speed performance, to work on code optimizations

@smarie Indeed I was considering more run time/memory benchmarks here. However if we go with ASV, it is able to also store custom metrics (e.g. model accuracy). The advantage is that it can compute it for commits in the history and identify regressions that way. By itself accuracy of a model on a problem is not meaningful, but if it changes a lot it would deserve additional investigation. Maybe we could start with run time benchmarks and add model performance benchmarks in a second step. We do try to test general (e.g. invariance) properties of estimators in common tests.

Does it automatically compare a branch with master, or do PR authors need to run the benchmarks on both branches manually?

Typically you want to run it on HEAD, then on master and show the difference. ASV allows to do that in one command. See pandas contribution guide, linked in the issue description, for more details. To give an example, the output could look something like,

$ asv continuous --bench SparseDataFrameConstructor upstream/master HEAD
[...]
[100.00%] ··· sparse.SparseDataFrameConstructor.time_from_scipy                                                                 117±1ms
       before           after         ratio
     [dbd7a5d3]       [508fda51]
     <master>         <opt-sparse-init>
-         117±1ms      26.8±0.07ms     0.23  sparse.SparseDataFrameConstructor.time_from_scipy

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

@rth
Copy link
Member Author
rth commented Apr 7, 2020

Also once this is resolved, we might want to archive and m 8000 ark as no longer maintained https://github.com/scikit-learn/scikit-learn-speed and https://github.com/scikit-learn/ml-benchmarks

@jeremiedbb
Copy link
Member

applicative = performance in terms of accuracy on challenging datasets.
speed = pure speed performance, to work on code optimizations

The benchmark suite allows to do accuracy checks also. For now I just check that some metric (depending on the model) and/or that the predicted values stay exactly the same between 2 commits.

@jeremiedbb
Copy link
Member

I made the changes I wanted. Changes in asv broke a functionality I thought there was no work around but just found one so it's fine. I opened #17026. Still missing the documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
0