8000 ASV Add benchmarks for `PairwiseDistancesReductions` by jjerphan · Pull Request #24120 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ASV Add benchmarks for PairwiseDistancesReductions #24120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

jjerphan
Copy link
Member
@jjerphan jjerphan commented Aug 5, 2022

Reference Issues/PRs

Relates to #22587

What does this implement/fix? Explain your changes.

I think investing some time in coming up with a proper benchmarking suite for PairwiseDistancesReductions is sensible: I want preventing quick benchmark whose results might be misleading (this is want I have been doing until now thinking I could gain time but I in retrospective I think I've lost some).

Also, longer-term-wise, I would like us, our future selves and future maintainers to be able to easily and confidently perform benchmark between revisions in case changes need to be performed.

Any other comments?

We might want to generate plots on top of asv results.

The radius definition is from Olivier:

scikit-learn#22320 (review)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@Micky774
Copy link
Contributor
Micky774 commented Aug 5, 2022

Already very helpful it seems :)

@jjerphan
Copy link
Member Author
jjerphan commented Aug 5, 2022

Thanks, @Micky774. I hope this will help you with future experimentations.

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original intention of the ASV benchmark suite was to continuously monitor user facing classes with a public API that remains stable across scikit-learn versions so as to detect performance regressions.

But given the fact that PairwiseDistancesArgKmin is and will be used quite pervasively in future scikit-learn version, maybe it's good to have a public benchmark even though this is not part of the public scikit-learn API.

So +0 in principle.

Alternatively we could keep on benchmark the public Python functions such as pairwise_argmin_min (even-though this is restricted to the case n_neighbors=1 instead of n_neighbors=10 which is more interesting) or KNeighborsClassifier / Regressors and their radius counterpart even though that will also include the classification/regression reduction operations on top of the neighbors computation which is not necessarily negligible.

@jjerphan how long does it take to run the full cartesian products of parameters of this PR?

Any opinion @jeremiedbb?

@jjerphan
Copy link
Member Author
jjerphan commented Aug 9, 2022

I think the original intention of the ASV benchmark suite was to continuously monitor user facing classes with a public API that remains stable across scikit-learn versions so as to detect performance regressions.

You're right. I think we should not pollute this ASV benchmark suite with this addition. Probably we can come up with another ASV benchmarks suite for private interfaces (in another repo)?

@jjerphan how long does it take to run the full cartesian products of parameters of this PR?

It can take quite a while for the datasets that are defined here. IIRC, hours on a 128 cores machines.

@ogrisel
Copy link
Member
ogrisel commented Aug 9, 2022

You're right. I think we should not pollute this ASV benchmark suite with this addition. Probably we can come up with another ASV benchmarks suite for private interfaces (in another repo)?

Or maybe a private/ subfolder under asv_benchmarks/ if the generic CRON job can be configured to skip the private benchmarks to avoid making them run slower.

It can take quite a while for the datasets that are defined here. IIRC, hours on a 128 cores machines.

We should definitely not run this in the weekly benchmark run that does not have this many cores.

@jjerphan
Copy link
Member Author

I am coming back to this issue because I need a proper benchmark suite for all the PairwiseDistancesReductions for several pull requests.

Yet, it feels like it would be better to have another distinct project for such a suite because this is rather niche and we probably do not want to maintain it as part of the scikit-learn repository.

What do you think?

In the meantime, I will turn this PR to draft and will maintain its branch on my fork based on my needs and will update when needed.

@jjerphan jjerphan marked this pull request as draft October 21, 2022 09:31
@jjerphan
Copy link
Member Author
jjerphan commented Dec 5, 2022

@jjerphan jjerphan closed this Dec 5, 2022
@jjerphan jjerphan deleted the asv/add-pdr-benchmarks branch December 5, 2022 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0