-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ASV Add benchmarks for PairwiseDistancesReductions
#24120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The radius definition is from Olivier: scikit-learn#22320 (review) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Already very helpful it seems :) |
Thanks, @Micky774. I hope this will help you with future experimentations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original intention of the ASV benchmark suite was to continuously monitor user facing classes with a public API that remains stable across scikit-learn versions so as to detect performance regressions.
But given the fact that PairwiseDistancesArgKmin
is and will be used quite pervasively in future scikit-learn version, maybe it's good to have a public benchmark even though this is not part of the public scikit-learn API.
So +0 in principle.
Alternatively we could keep on benchmark the public Python functions such as pairwise_argmin_min
(even-though this is restricted to the case n_neighbors=1
instead of n_neighbors=10
which is more interesting) or KNeighborsClassifier
/ Regressors
and their radius counterpart even though that will also include the classification/regression reduction operations on top of the neighbors computation which is not necessarily negligible.
@jjerphan how long does it take to run the full cartesian products of parameters of this PR?
Any opinion @jeremiedbb?
You're right. I think we should not pollute this ASV benchmark suite with this addition. Probably we can come up with another ASV benchmarks suite for private interfaces (in another repo)?
It can take quite a while for the datasets that are defined here. IIRC, hours on a 128 cores machines. |
Or maybe a
We should definitely not run this in the weekly benchmark run that does not have this many cores. |
I am coming back to this issue because I need a proper benchmark suite for all the Yet, it feels like it would be better to have another distinct project for such a suite because this is rather niche and we probably do not want to maintain it as part of the scikit-learn repository. What do you think? In the meantime, I will turn this PR to draft and will maintain its branch on my fork based on my needs and will update when needed. |
Namely support for float32 and CSR matrices.
Closing in favor of https://github.com/jjerphan/pairwise-distances-reductions-asv-suite. |
Reference Issues/PRs
Relates to #22587
What does this implement/fix? Explain your changes.
I think investing some time in coming up with a proper benchmarking suite for
PairwiseDistancesReductions
is sensible: I want preventing quick benchmark whose results might be misleading (this is want I have been doing until now thinking I could gain time but I in retrospective I think I've lost some).Also, longer-term-wise, I would like us, our future selves and future maintainers to be able to easily and confidently perform benchmark between revisions in case changes need to be performed.
Any other comments?
We might want to generate plots on top of
asv
results.